Use of Modality and Negation in Semantically-Informed Syntactic MT
Abstract
This paper describes the resource- and system-building efforts of an eight-week JHU Human Language Technology Center of Excellence Summer Camp for Applied Language Exploration (SCALE-2009) on Semantically-Informed Machine Translation (SIMT). We describe a new modality/negation (MN) annotation scheme, a (publicly available) MN lexicon, and two automated MN taggers that we built using the annotation scheme and lexicon. Our annotation scheme isolates three components of modality and negation: a trigger, a target and a holder. We describe how our MN lexicon was produced semi-automatically and we demonstrate that a structure-based MN tagger results in precision around 86% (depending on genre) for tagging of a standard LDC data set.
We also present a unified and coherent syntactic framework that supports the use of modality, negation, and named entities in statistical machine translation. Syntactic tags enriched with modality, negation and named-entity information are assigned to parse trees in the target-language training texts through a process of tree grafting. The resulting system significantly outperformed a linguistically naïve baseline model (Hiero), and reached the highest scores reported on the NIST 2009 Urdu-English translation task. This finding supports the hypothesis (posed by many researchers in the machine-translation community, e.g., in DARPA GALE) that both syntactic and semantic information are critical for improving translation quality—and further demonstrates that large gains can be achieved for low-resource languages with word order different from English.