CODRA: A Novel Discriminative Framework for Rhetorical Analysis

Authors

  • Shafiq Rayhan Joty Qatar Computing Research Institute (QCRI)- A member of Qatar Foundation
  • Giuseppe Carenini University of British Columbia
  • Raymond T. Ng University of British Columbia

Abstract

Clauses and sentences rarely stand on their own in an actual discourse; rather the relationship

between them carry important information which allows the discourse to express a meaning

as a whole beyond the sum of its individual parts. Rhetorical analysis seeks to uncover this coherence

structure. In this article, we present CODRA — a COmplete probabilistic Discriminative

framework for performing Rhetorical Analysis in accordance with Rhetorical Structure Theory,

which posits a tree representation of a discourse.

CODRA  comprises a discourse segmenter and a discourse parser. First, the discourse

segmenter, which is based on a binary classifier, identifies the elementary discourse units in a

given text. Then the discourse parser builds a discourse tree by applying an optimal parsing

algorithm to probabilities inferred from two Conditional Random Fields: one for intra-sentential

parsing and the other for multi-sentential parsing. We present two approaches to combine these

two stages of parsing effectively. By conducting a series of empirical evaluations over two

different datasets, we demonstrate that CODRA  significantly outperforms the state-of-the-art,

often by a wide margin. We also show that a reranking of the k -best parse hypotheses generated

by CODRA  can potentially improve the accuracy even further.

Published

2024-12-05

Issue

Section

Long paper