Chinese Syntactic Processing using the Generalized Perceptron and Beam Search

Yue Zhang; Stephen Clark

Authors

Yue Zhang University of Oxford
Stephen Clark University of Cambridge

Abstract

We study Chinese syntactic processing using a general statistical framework, which consists of a global linear model, trained by the generalized perceptron algorithm together with a generic beam-search decoder. We apply the framework to Chinese word segmentation, joint word segmentation and POS-tagging, dependency parsing and phrase-structure parsing, and show that state-of-the-art accuracies can be achieved for all these tasks. One of the main advantages of our framework is the freedom to define arbitrary features that capture global statistical patterns, which leads to improved accuracy. For example, the framework enables a direct word-based approach for Chinese word segmentation, without mapping it to a character sequence tagging problem. We also apply the framework to joint segmentation and POS-tagging, with both types of features in a single model, and are able to develop an efficient decoder for the joint problem. For dependency parsing, we use the framework to develop a transition-based parser which also contains elements of a graph-based parser, combining both approaches into a single consistent system, which outperforms each approach individually. For phrase-structure parsing, we use the framework to develop a global model for a shift-reduce parsing algorithm, in contrast to the current deterministic Chinese phrase-structure parsers. We conclude that the framework is a competitive choice for Chinese syntactic processing, and that it can be applied more generally to other NLP tasks and languages.

Published

2024-12-05

Issue

Past Publications

Section

Long Paper

EMNLP 2025 – CL deadlines for Qualifying Papers

April 3, 2025

To be eligible for presentation (oral or poster, etc.) at EMNLP 2025, CL papers must satisfy both of the following conditions:

* receive an accepted decision by July 16^th

* with the final version submitted (and approved to be sent to MIT Press) by July 30^th

Exclusions:

Your paper has been presented previously at other conferences.
Your submission was an extension of prior work.
Your submission was a survey proposal.

----------------------------------------------------

Authors Registration Fee Details: Author-Registered Papers (for presentation) Industrial/Non-Academic, Academic or Student

At least one author of each accepted paper to an ACL conference (ACL, NAACL, EACL, AACL, or EMNLP) must register their paper to present at the conference. Exceptions to the statement above: Accepted Finding that are not being presented. All findings being presented must register their paper. Workshop shared tasked papers do not need to register their paper to present.

Note all Paper registration fees are based on actual hard cost to the conference - In person registration fees reflects the attendees’ hard costs of food & beverage (breaks, welcome reception and social dinner) along with meeting space, av or poster presentation equipment). Virtual attendees’ registration fees reflect the virtual costs (internet, AV, content management, platforms).

Computational Linguistics - December 2025 51(1) has been published!

April 1, 2025

Celebrating 50 years!

By the end of 2024, the journal Computational Linguistics has reached a significant milestone: It has published exactly 50 volumes over the past half-century. As we launch the first issue of Volume 51, this is an opportune moment to reflect on the journal’s legacy, ongoing evolution, and the exciting changes that lie ahead. Together, we embark on a journey to open a new chapter for this storied publication.

https://direct.mit.edu/coli/issue/51/1

Chinese Syntactic Processing using the Generalized Perceptron and Beam Search

Authors

Abstract

Published

Issue

Section

Make a Submission

Information

Announcements

EMNLP 2025 – CL deadlines for Qualifying Papers

Computational Linguistics - December 2025 51(1) has been published!