Sequence Labeling for Constituent Parsing: A Comparative Study and Encoding Innovations

Authors

Keywords:

Constituent Parsing, Sequence Labeling, Parsing Encodings, Syntax, Natural Language Processing

Abstract

Various encodings have been proposed to cast constituent parsing in terms of a sequence
labeling task. However, unlike in the case of dependency parsing, existing comparisons have not
been entirely homogeneous and, to the best of our knowledge, there is no systematic evaluation
of these encodings under uniform configurations. A homogeneous evaluation needs to account
for various aspects that could influence results, either by controlling for these aspects to ensure
uniformity (e.g., network architecture, parameter settings, postprocessing of ill-formed output), or
by systematically analyzing their impact (e.g., the impact of binary versus arbitrary structures). In
this paper, we: (1) compare different encodings comprehensively both theoretically and empirically,
on a modern neural architecture and across nine languages, and (2) also introduce new encodings
and variants, including an encoding that our analysis finds particularly accurate and compact.

Author Biographies

  • David Vilares, Universidade da Coruña, CITIC

    Assistant Professor, Dept. of Computer Science and Information Technologies, Universidade da Coruña

  • Carlos Gómez-Rodríguez, Universidade da Coruña, CITIC

    Full Professor, Dept. of Computer Science and Information Technologies, Universidade da Coruña

Published

2026-07-01