Automatic Adaptation of Annotation Guidelines

Authors

  • Wenbin Jiang Institute of Computing Technology, Chinese Academy of Sciences
  • Yajuan Lü Institute of Computing Technology, Chinese Academy of Sciences
  • Liang Huang Department of Computer Science, Queens College and Graduate Center, The City University of New York
  • Qun Liu Centre for Next Generation Localisation, Faculty of Engineering and Computing, Dublin City University Institute of Computing Technology, Chinese Academy of Sciences

Abstract

Manually annotated corpora are valuable but scarce resources, yet for many annotation tasks such as treebanking and sequence labeling, there exist multiple corpora with different and incompatible annotation guidelines. This seems to be a great waste of human efforts, and it would be nice to automatically adapt from one annotation guideline to another. In this article we describe the problem of automatic adaptation of annotation guidelines, and present a series of successively enhanced models that can transform the knowledge in a manually annotated corpus to a different annotation guideline we desire. The effectiveness of the algorithms are validated on the task of Chinese word segmentation, where no segmentation guidelines are widely accepted due to the lack of morphology in Chinese. Annotation adaptation from the much larger People's Daily corpus to the smaller but more popular Penn Chinese Treebank results in significant improvement, and achieves the state-of-the-art although using classifiers with only local features.

Published

2024-12-05

Issue

Section

Short paper