Survey: Evaluating the Quality of Texts Produced by NLP Systems
Abstract
I survey techniques and experimental designs used to evaluate the quality of texts produced by NLP systems, including machine translation, natural language generation, and summarisation. I present evaluation as a type of scientific hypothesis testing, and include in this survey papers from the broader scientific community as well as papers from the NLP community.