A Joint Model to Simultaneously Identify and Align Bilingual Named Entities

Authors

  • Yufeng Chen
  • Chengqing Zong
  • Keh-Yih Su

Abstract

We observed that (1) how a given named entity (NE) is translated (i.e., either semantically or phonetically) depends greatly on its associated entity type, and (2) entities within an aligned pair should share the same type. Also, (3) those initially detected NEs are anchors, whose information can be used to give certainty scores when selecting candidates. From this basis, an integrated model is derived in this paper to jointly identify and align bilingual named entities between Chinese and English. It adopts a novel mapping type ratio feature (which is the proportion of NE internal tokens that are semantically translated), enforces an entity type consistency constraint, and utilizes additional new monolingual candidate certainty factors (based on those NE anchors).
Experiments show that this novel approach has greatly improved the baseline system. After further weighing various factors differently according to their contributions, it has substantially raised the type-insensitive F-score of identified NE-pairs on the testing set from 78.4% to 88.0% (44.4% F-score imperfection reduction), and the type-sensitive F-score from 68.4% to 83.0% (46.2% F-score imperfection reduction) in our Chinese-English NE alignment task. Furthermore, when semi-supervised learning is conducted to train the adopted English NE recognition model (with only 100 seed sentence-pairs), the proposed model greatly boosts the English NE recognition type-sensitive F-score from 36.7% to 47.4% (29.2% relative improvement) on the testing set.

Published

2024-12-05

Issue

Section

Long paper