A Survey on LLM-Generated Text Detection: Necessity, Methods, and Future Directions

Authors

  • Junchao Wu University of Macau
  • Shu Yang University of Macau
  • Runzhe Zhan University of Macau
  • Yulin Yuan University of Macau
  • Derek Fai Wong University of Macau
  • Lidia Sam Chao University of Macau

Abstract

The remarkable ability of large language models (LLMs) to comprehend, interpret, and generate complex language has rapidly integrated LLM-generated text into various aspects of daily life, where users increasingly accept it. However, the growing reliance on LLMs underscores the urgent need for effective detection mechanisms to identify LLM-generated text. Such mechanisms are critical to mitigating misuse and safeguarding domains like artistic expression and social networks from potential negative consequences. LLM-generated text detection, conceptualised as a binary classification task, seeks to determine whether an LLM produced a given text. Recent advances in this field stem from innovations in watermarking techniques, statistical detectors, and neural-based detectors. Human- Assisted methods also play a crucial role. In this survey, we consolidate recent research breakthroughs in this field, emphasising the urgent need to strengthen detector research. Additionally, we review existing datasets, highlighting their limitations and developmental requirements. Furthermore, we examine various LLM-generated text detection paradigms, shedding light on challenges like out-of-distribution problems, potential attacks, real-world data issues and ineffective evaluation frameworks. Finally, we outline intriguing directions for future research in LLM-generated text detection to advance responsible artificial intelligence (AI). This survey aims to provide a clear and comprehensive introduction for newcomers while offering seasoned researchers valuable updates in the field.

Author Biographies

  • Junchao Wu, University of Macau

    Junchao Wu is currently a Master’s student in Computational Linguistics at the University of Macau. He is a member of the Natural Language Processing & Portuguese–Chinese Machine Translation (NLP2CT) Research Group. His research interests include natural language processing and machine translation.

  • Shu Yang, University of Macau

    Shu Yang is working toward the Master’s degree in Computational Linguistics with the University of Macau, Macau, China. She is a member of the Natural Language Processing & Portuguese–Chinese Machine Translation (NLP2CT) Research Group. Her research interests include trustworthy AI, explainable AI, and human-computer interaction.

  • Runzhe Zhan, University of Macau

    Runzhe Zhan is currently working toward the Ph.D. degree in computer science with the University of Macau, Macau, China. He is a member of the Natural Language Processing & Portuguese–Chinese Machine Translation (NLP2CT) Research Group. His research interests include machine translation and natural language generation.

  • Yulin Yuan, University of Macau

    Yulin Yuan received his Ph.D. degree from Peking University and currently he is a Chair Professor and Head of the Department of Chinese Language and Literature at the University of Macau. He has authored more than 100 journal articles in journals such as Contemporary Linguistics, and Journal of Chinese Information Processing, as well as more than 10 books. He was selected as a Changjiang Scholar Distinguished Professor of the Ministry of Education of China and also a leading character in philosophy and social sciences under the National “Ten Thousand Talents Program”.

  • Derek Fai Wong, University of Macau
    Derek F. Wong is an Associate Professor in the Department of Computer and Information Science at the University of Macau, where he leads the Natural Language Processing & Portuguese-Chinese Machine Translation Laboratory (NLP2CT). He earned his Ph.D. from Tsinghua University, and his primary research interests include neural machine translation and natural language processing.
  • Lidia Sam Chao, University of Macau

    Lidia Sam Chao received a Ph.D. degree in Software Engineering from the University of Macau in 2008. Since 1996, she has been with the University of Macau. Her research focuses on data mining, machine learning, and knowledge acquisition in language and bioinformatics.

Published

2025-03-24