Language Models for Machine Translation: Original vs. Translated Texts

Gennadi Lembersky; Noam Ordan; Shuly Wintner

Authors

Gennadi Lembersky
Noam Ordan
Shuly Wintner Department of Computer Science, University of Haifa

Abstract

We investigate the differences between language models compiled from original target-language texts and those compiled from texts manually translated to the target language. Corroborating established observations of Translation Studies, we demonstrate that the latter are significantly better predictors of translated sentences than the former, and hence fit the reference set better. Furthermore, translated texts yield better language models for statistical machine translation than original texts.

Language Models for Machine Translation: Original vs. Translated Texts

Authors

Abstract

Published

Issue

Section

Make a Submission

Information

Announcements

EACL 2027 - CL deadlines for Qualifying Papers

Special Issue on the Ethics of NLP and CL in Computational Linguistics