Use of Modality and Negation in Semantically-Informed Syntactic MT

Authors

  • Bonnie Jean Dorr
  • Kathryn Baker
  • Michael Bloodgood
  • Chris Callison-Burch
  • Nathaniel W. Filardo
  • Christine Piatko
  • Lori Levin
  • Scott Miller

Abstract

This paper describes the resource- and system-building efforts of an eight-week JHU Human Language Technology Center of Excellence Summer Camp for Applied Language Exploration (SCALE-2009) on Semantically-Informed Machine Translation (SIMT). We describe a new modality/negation (MN) annotation scheme, a (publicly available) MN lexicon, and two automated MN taggers that we built using the annotation scheme and lexicon. Our annotation scheme isolates three components of modality and negation: a trigger, a target and a holder. We describe how our MN lexicon was produced semi-automatically and we demonstrate that a structure-based MN tagger results in precision around 86% (depending on genre) for tagging of a standard LDC data set.

We also present a unified and coherent syntactic framework that supports the use of modality, negation, and named entities in statistical machine translation. Syntactic tags enriched with modality, negation and named-entity information are assigned to parse trees in the target-language training texts through a process of tree grafting. The resulting system significantly outperformed a linguistically naïve baseline model (Hiero), and reached the highest scores reported on the NIST 2009 Urdu-English translation task. This finding supports the hypothesis (posed by many researchers in the machine-translation community, e.g., in DARPA GALE) that both syntactic and semantic information are critical for improving translation quality—and further demonstrates that large gains can be achieved for low-resource languages with word order different from English.

Author Biography

  • Bonnie Jean Dorr

    Dr. Bonnie Dorr is a Professor in the Department of Computer Studies and Institute for Advanced Computer Studies at the University of Maryland and Associate Dean for the College of Computer, Mathematical, and Natural Sciences.  She received her Ph.D. Degree in Computer Science, with a minor in Linguistics, from the Massachusetts Institute of Technology in 1990 and has been on the faculty at the University of Maryland since then.  She is a leading researcher in the areas of semantically-informed machine translation, single- and multi-document summarization, and language understanding. She co-founded the Computational Linguistics and Information Processing (CLIP) Laboratory and served as its director for 15 years. She has carried out seminal work in cross-language divergence detection and has led several efforts in summarization, machine translation, paraphrasing and automatic evaluation metrics.  She was a founding member of the Johns Hopkins Human Language Technology Center of Excellence (COE), where she served as Principal Scientist and leader of the Language Understanding project for two years.  She has served on the Executive Council of the Association for Artificial Intelligence and on the Executive Board of the Association for Computational Linguistics (ACL) and was the organizer of the annual ACL conference in 1998.  She is a Sloan Fellow (1993), a NSF Young Investigator (1994), a Maryland Distinguished Scientist (1996), a NSF Presidential Faculty Fellow (1997), and also served as Vice-President and President of the Association for Computational Linguistics (2007-2008). 

Published

2024-12-05

Issue

Section

Special Issue on Modality and Negation