Adapting Translation Models to Translationese

Authors

  • Gennadi Lembersky
  • Noam Ordan
  • Shuly Wintner Department of Computer Science, University of Haifa

Abstract

Translation models used for statistical machine translation are  compiled from parallel corpora. Such corpora are manually  translated, under the assumption that parallel texts are symmetrical:  the direction of translation is deemed irrelevant and is consequently   ignored. However, much research in Translation  Studies indicates that the direction of translation matters, as  translated language (translationese) has many unique  properties. It has already been shown that phrase tables constructed  from parallel corpora translated in the same direction as the  translation task outperform those constructed from corpora  translated in the opposite direction.

We reconfirm that this is indeed the case, but emphasize the  importance of using also texts translated in the `wrong' direction.  We take advantage of information pertaining to the direction of  translation in constructing phrase tables, by adapting the  translation model to the special properties of translationese.  We  explore two adaptation techniques: First, we create a mixture model  by interpolating phrase tables trained on texts translated in the  `right' and the `wrong' directions. The weights for the  interpolation are determined by minimizing perplexity. Second, we  define entropy-based measures that estimate the  correspondence of target-language phrases to translationese, thereby  eliminating the need to annotate the parallel corpus with  information pertaining to the direction of translation. We show that  incorporating these measures as features in the phrase tables of  statistical machine translation systems results in consistent,  statistically significant improvement in the quality of the  translation.

Published

2024-12-05

Issue

Section

Long paper