Chinese Syntactic Processing using the Generalized Perceptron and Beam Search

Authors

  • Yue Zhang University of Oxford
  • Stephen Clark University of Cambridge

Abstract

We study Chinese syntactic processing using a general statistical framework, which consists of a global linear model, trained by the generalized perceptron algorithm together with a generic beam-search decoder. We apply the framework to Chinese word segmentation, joint word segmentation and POS-tagging, dependency parsing and phrase-structure parsing, and show that state-of-the-art accuracies can be achieved for all these tasks. One of the main advantages of our framework is the freedom to define arbitrary features that capture global statistical patterns, which leads to improved accuracy. For example, the framework enables a direct word-based approach for Chinese word segmentation, without mapping it to a character sequence tagging problem. We also apply the framework to joint segmentation and POS-tagging, with both types of features in a single model, and are able to develop an efficient decoder for the joint problem. For dependency parsing, we use the framework to develop a transition-based parser which also contains elements of a graph-based parser, combining both approaches into a single consistent system, which outperforms each approach individually. For phrase-structure parsing, we use the framework to develop a global model for a shift-reduce parsing algorithm, in contrast to the current deterministic Chinese phrase-structure parsers.  We conclude that the framework is a competitive choice for Chinese syntactic processing, and that it can be applied more generally to other NLP tasks and languages.

Published

2024-12-05

Issue

Section

Long paper