Automatic Adaptation of Annotation Guidelines
Abstract
Manually annotated corpora are valuable but scarce resources, yet for many annotation tasks such as treebanking and sequence labeling, there exist multiple corpora with different and incompatible annotation guidelines. This seems to be a great waste of human efforts, and it would be nice to automatically adapt from one annotation guideline to another. In this article we describe the problem of automatic adaptation of annotation guidelines, and present a series of successively enhanced models that can transform the knowledge in a manually annotated corpus to a different annotation guideline we desire. The effectiveness of the algorithms are validated on the task of Chinese word segmentation, where no segmentation guidelines are widely accepted due to the lack of morphology in Chinese. Annotation adaptation from the much larger People's Daily corpus to the smaller but more popular Penn Chinese Treebank results in significant improvement, and achieves the state-of-the-art although using classifiers with only local features.Published
2024-12-05
Issue
Section
Short Paper