Corpus-Based Sentence Simplification

Authors

  • Fernando Alva Manchego University of Sheffield
  • Carolina Scarton University of Sheffield
  • Lucia Specia University of Sheffield

Abstract

Text Simplification (TS) aims to modify a text in order to make it easier to read and understand. In order to do so, several rewriting operations can be performed, e.g., replacement, reordering, and splitting. Executing these text transformations while keeping sentences grammatical, preserving their meaning, and generating simpler output, is a challenging and yet unsolved problem. We propose to survey research on sentence-level simplification, focusing on approaches that attempt to learn how to simplify using corpora of aligned original-simplified sentence pairs (the dominant paradigm nowadays). We will also include a benchmark of different approaches on common datasets so as to compare them and highlight their strengths and limitations. This will be the first survey on corpus-based TS and the first to directly compare different approaches. We expect that this survey will serve as a starting point for researchers interested in the task and help spark new ideas for future developments.

Author Biographies

  • Fernando Alva Manchego, University of Sheffield
    PhD Student, Department of Computer Science
  • Carolina Scarton, University of Sheffield

    Research Associate, Department of Computer Science

  • Lucia Specia, University of Sheffield

    Professor of Language Engineering - Department of Computer Science 

Published

2024-12-05

Issue

Section

Survey article