Feature-based Decipherment for Machine Translation

Authors

  • Iftekhar Naim Google
  • Parker Riley University of Rochester
  • Daniel Gildea University of Rochester

Abstract

Orthographic similarities across languages provide a strong signal for probabilistic decipherment, especially for closely related language pairs.
The existing decipherment models, however, are not well-suited for exploiting these orthographic similarities.
We propose a log-linear model with latent variables that incorporates orthographic similarity features.
Maximum likelihood training is computationally expensive for the proposed log-linear model.
To address this challenge, we perform approximate inference via MCMC sampling and contrastive divergence.
Our results show that the proposed log-linear model with contrastive divergence scales to large vocabularies and outperforms the existing generative decipherment models by exploiting the orthographic features.

Published

2024-12-05

Issue

Section

Short paper