Language Models for Machine Translation: Original vs. Translated Texts

Authors

  • Gennadi Lembersky
  • Noam Ordan
  • Shuly Wintner Department of Computer Science, University of Haifa

Abstract

We investigate the differences between language models compiled from  original target-language texts and those compiled from texts  manually translated to the target language. Corroborating  established observations of Translation Studies, we demonstrate that  the latter are significantly better predictors of translated  sentences than the former, and hence fit the reference set  better. Furthermore, translated texts yield better language models  for statistical machine translation than original texts.

Published

2024-12-05

Issue

Section

Short paper