From Bags of Words to Sentences: Discriminative Syntax-Based Word Ordering for Text Generation

Authors

  • Yue Zhang
  • Stephen Clark

Abstract

Word ordering is a fundamental problem in natural language generation. In this article, we study word ordering in isolation: assuming that a multi-set of input words has been given, the task isto order them into a grammatical and fluent sentence. Our system uses a syntax-based approach, based on the grammar formalism Combinatory Categorial Grammar (CCG), and a discriminative model. Given the search for a likely string and CCG derivation, the search space is massive, making discriminative training challenging. We develop alearning-guided search framework, based on best-first search, and investigate several alternative training algorithms. Two challenges for the frameworkare the fair comparison of alternative hypotheses, since competing hypotheses can be of radically different sizes, and the incorporation of negative hypotheses while performing search during training, since this slows down the search for the gold-standard goal hypothesis. We find that a scaled discriminative model with scores normalised by hypothesis size leads to improved performance, and allows the incorporation of negative examples. We report improved performance over existing work on a standard Wall Street Journal testset. The framework we present is flexible in that it allows constraints to be imposed on output word orders, and in practice couldbe applied to problems such as re-ordering the output of statistical machine translation systems.

Published

2024-12-05

Issue

Section

Long paper