Plagiarism meets Paraphrasing: Insights for the Next Generation in Automatic Plagiarism Detection

Authors

  • Alberto Barrón-Cedeño Universitat Politècnica de Catalunya
  • Marta Vila Universitat de Barcelona
  • M. Antònia Martí Universitat de Barcelona
  • Paolo Rosso Universidad Politécnica de Valencia

Abstract

Although paraphrasing is the linguistic mechanism underlying many plagiarism cases, little attention has been paid to its analysis in the framework of automatic plagiarism detection. Therefore, state-of-the-art plagiarism detectors find difficult to detect cases of paraphrase plagiarism. In this paper, we analyse the relationship between paraphrasing and plagiarism, paying special attention to which paraphrase phenomena underlie acts of plagiarism and which of them are detected by plagiarism detection systems. With this aim in mind, we created the P4P corpus, a new resource which uses a paraphrase typology to annotate a subset of the PAN-PC-10 corpus for automatic plagiarism detection. The results of the Second International Competition on Plagiarism Detection were analysed in the light of this annotation.
Our experiments show that (i) more complex paraphrase phenomena and a high density of paraphrasemechanismsmake plagiarism detectionmore difficult, (ii) lexical substitutions are the paraphrase mechanisms used the most when plagiarising, and (iii) paraphrase mechanisms tend to shorten the plagiarised text. For the first time, the paraphrase mechanisms behind plagiarism have been analysed, providing critical insights for the improvement of automatic plagiarism detection systems.

Author Biographies

  • Alberto Barrón-Cedeño, Universitat Politècnica de Catalunya
    Postdoctoral researcher at TALP Research Center.
  • Marta Vila, Universitat de Barcelona
    PhD student at the General Linguistics Department
  • M. Antònia Martí, Universitat de Barcelona
    Associate professor (professor titular d'universitat) at the General Linguistics Department
  • Paolo Rosso, Universidad Politécnica de Valencia
    Associate professor (profesor titular de universidad) at the Department of Information Systems and Computation

Published

2024-12-05

Issue

Section

Long paper