Parsing Chinese Sentences with Grammatical Relations
Abstract
We report our work on building linguistic resources and statistical parsers for grammatical relation (GR) analysis of Chinese sentences.Chinese, as an analytic language, encodes grammatical information in a highly configurational rather than morphological way. %(either inflectional or derivational) way.
Accordingly, it is possible yet reasonable to represent almost all grammatical relations as bi-lexical dependencies.
In this work, we propose to represent the grammatical information using general directed dependency graphs.
Not only local, but also rich long-distance dependencies are explicitly represented.
To create high-quality annotations, we take advantages of an existing TreeBank, viz. Chinese TreeBank (CTB), which is grounded in the Government and Binding theory.
We define a set of linguistic rules to explore CTB's implicit phrase structural information and build deep dependency graphs.
The reliability of this linguistically-motivated GR extraction procedure is highlighted by manual evaluation.
Based on the converted corpus, we study data-driven, including graph- and transition-based, models for Chinese GR parsing.
For graph-based parsing, we propose graph merging, a new perspective, for building flexible dependency graphs:
 Constructing complex graphs via constructing simple subgraphs.
We discuss two key problems in this perspective: (1) how to decompose a complex graph into
simple subgraphs, and (2) how to combine subgraphs into a coherent complex graph.
For transition-based parsing, we introduce a neural parser based on a list-based transition system.
We also discuss several key problems, including dynamic oracle and beam search, in neural transition-based parsing.
Evaluation gauges how successful GR parsing for Chinese can be by applying data-driven models.
The empirical analysis suggests several directions for future study.