Automatic Adaptation of Annotation Guidelines

Wenbin Jiang; Yajuan Lü; Liang Huang; Qun Liu

Authors

Wenbin Jiang Institute of Computing Technology, Chinese Academy of Sciences
Yajuan Lü Institute of Computing Technology, Chinese Academy of Sciences
Liang Huang Department of Computer Science, Queens College and Graduate Center, The City University of New York
Qun Liu Centre for Next Generation Localisation, Faculty of Engineering and Computing, Dublin City University Institute of Computing Technology, Chinese Academy of Sciences

Abstract

Manually annotated corpora are valuable but scarce resources, yet for many annotation tasks such as treebanking and sequence labeling, there exist multiple corpora with different and incompatible annotation guidelines. This seems to be a great waste of human efforts, and it would be nice to automatically adapt from one annotation guideline to another. In this article we describe the problem of automatic adaptation of annotation guidelines, and present a series of successively enhanced models that can transform the knowledge in a manually annotated corpus to a different annotation guideline we desire. The effectiveness of the algorithms are validated on the task of Chinese word segmentation, where no segmentation guidelines are widely accepted due to the lack of morphology in Chinese. Annotation adaptation from the much larger People's Daily corpus to the smaller but more popular Penn Chinese Treebank results in significant improvement, and achieves the state-of-the-art although using classifiers with only local features.

Automatic Adaptation of Annotation Guidelines

Authors

Abstract

Published

Issue

Section

Make a Submission

Information

Announcements

2026 *ACL Conference Dates

Computational Linguistics - September 2025 51(3) has been published!