Greedy Transition-based Dependency Parsing with Stack-LSTMs
Abstract
We introduce a greedy transition-based parser that learns representations of parser states using recurrent neural networks. Our primary innovation that enables us to do this efficiently is a new control structure for sequence-to-sequence neural networks---the stack LSTM. Like the conventional stack data structures used in transition-based parsing, elements
can be pushed to or popped from the top of the stack in constant time, but, in
addition, an LSTM maintains a continuous space embedding of the stack contents.
We demonstrate that the stack LSTM help to capture three facets of
a parser's state in an efficient way: (i) unbounded look-ahead into the buffer of incoming words, (ii) the complete history of transition actions taken by the parser,
and (iii) the complete contents of the stack of partially built tree
fragments, including their internal structures. In addition, we
compare two different word representations: (i) standard word vectors
based on look-up tables and (ii) character-based models of words. While the former vector model works well in all languages, the latter improves the
handling of out-of-vocabulary words particularly in morphologically rich languages. Finally, we discuss the use of dynamic oracles in training the parser. These yield state-of-the-art performance, showing that it is possible to achieve one of the best-published results with a linear greedy transition-based parser.