Information Status Distinctions and Referring Expressions: An Empirical Study of References to People in News Summaries

Authors

  • Advaith Siddharthan Department of Computing Science, University of Aberdeen
  • Ani Nenkova Department of Computer and Information Science, University of Pennsylvania
  • Kathleen McKeown Computer Science Department, Columbia University

Abstract

While there has been much theoretical work on using various information status distinctions to explain the form of references in written text, there have been few studies that attempt to automatically learn these distinctions for generating references in the context of computer regenerated text. In this paper, we present a model for generating references to people in news summaries that incorporates insights from both theory and a corpus analysis of human written summaries. In particular, our model captures how two properties of a person referred to in the summary – familiarity to the reader and global salience in the news story – affect the content and form of the reference to that person in a summary. We demonstrate that these two distinctions can be inferred with high accuracy from a typical input for multi-document summarization and that they can be used to make generation decisions.

Published

2024-12-05

Issue

Section

Long paper