Sequence Labeling for Constituent Parsing: A Comparative Study and Encoding Innovations
Keywords:
Constituent Parsing, Sequence Labeling, Parsing Encodings, Syntax, Natural Language ProcessingAbstract
Various encodings have been proposed to cast constituent parsing in terms of a sequence
labeling task. However, unlike in the case of dependency parsing, existing comparisons have not
been entirely homogeneous and, to the best of our knowledge, there is no systematic evaluation
of these encodings under uniform configurations. A homogeneous evaluation needs to account
for various aspects that could influence results, either by controlling for these aspects to ensure
uniformity (e.g., network architecture, parameter settings, postprocessing of ill-formed output), or
by systematically analyzing their impact (e.g., the impact of binary versus arbitrary structures). In
this paper, we: (1) compare different encodings comprehensively both theoretically and empirically,
on a modern neural architecture and across nine languages, and (2) also introduce new encodings
and variants, including an encoding that our analysis finds particularly accurate and compact.