Source Language Adaptation Approaches for Resource-Poor Machine Translation

Pidong Wang; Preslav Nakov; Hwee Tou Ng

Authors

Pidong Wang Machine Zone Inc. Palo Alto, USA
Preslav Nakov Qatar Computing Research Institute
Hwee Tou Ng Department of Computer Science National University of Singapore

Abstract

Most of the world languages are resource-poor for statistical machine translation; still, many of them are actually related to some resource-rich language. Thus, we propose three novel, language-independent approaches to source language adaptation for resource-poor statistical machine translation. Specifically, we build improved statistical machine translation models from a resource-poor language POOR into a target language TGT by adapting and using a large bitext for a related resource-rich language RICH and the same target language TGT. We assume a small POOR-TGT bi-text from which we learn word-level and phrase-level paraphrases and cross-lingual morphological variants between the resource-rich and the resource-poor language. Our work is of importance for resource-poor machine translation since it can provide a useful guideline for people building machine translation systems for resource-poor languages.
Our experiments for Indonesian/Malay–English translation show that using the large adapted resource-rich bi-text yields 7.26 BLEU points of improvement over the unadapted one and 3.09 BLEU points over the original small bi-text. Moreover, combining the small POOR-TGT bi-text with the adapted bi-text outperforms the corresponding combinations with the unadapted bi-text by 1.93-3.25 BLEU points. We also demonstrate the applicability of our approaches to other languages and domains.

Author Biographies

Pidong Wang, Machine Zone Inc. Palo Alto, USA

Senior NLP Engineer at Machine Zone Inc.
Preslav Nakov, Qatar Computing Research Institute

Senior Scientist at the Qatar Computing Research Institute
Hwee Tou Ng, Department of Computer Science National University of Singapore

Professor at Department of Computer Science in National University of Singapore

Source Language Adaptation Approaches for Resource-Poor Machine Translation

Authors

Abstract

Author Biographies

Published

Issue

Section

Make a Submission

Information

Announcements

EMNLP 2025 – CL deadlines for Qualifying Papers

Computational Linguistics - December 2025 51(1) has been published!