Towards Faithful Model Explanation in NLP: A Survey

Qing Lyu; Marianna Apidianaki; Chris Callison-Burch

Authors

Qing Lyu University of Pennsylvania
Marianna Apidianaki University of Pennsylvania
Chris Callison-Burch University of Pennsylvania

Abstract

End-to-end neural Natural Language Processing (NLP) models are notoriously difficult to understand. This has given rise to numerous efforts towards model explainability in recent years. One desideratum of model explanation is faithfulness, i.e. an explanation should accurately represent the reasoning process behind the model’s prediction. In this survey, we review over 110 model explanation methods in NLP through the lens of faithfulness. We first discuss the definition and evaluation of faithfulness, as well as its significance for explainability. We then introduce recent advances in faithful explanation, grouping existing approaches into five categories: similarity methods, analysis of model-internal structures, backpropagation-based methods, counterfactual intervention, and self-explanatory models. For each category, we synthesize its representative studies, strengths, and weaknesses. Finally, we summarize their common virtues and remaining challenges, and reflect on future work directions towards faithful explainability in NLP.

Author Biographies

Qing Lyu, University of Pennsylvania

Qing Lyu is a fifth-year PhD student in Computer and Information Science at the University of Pennsylvania, advised by Chris Callison-Burch and Marianna Apidianaki. In the first two years of her doctoral study, she has been working on information extraction and schema learning. Her current research interests lie in the intersection of linguistics and natural language processing, especially probing language models for robustness and interpretability.
Marianna Apidianaki, University of Pennsylvania

Marianna Apidianaki is a Senior Research Investigator in the Computer and Information Science (CIS) Department at the University of Pennsylvania, on leave from the French National Research Center (CNRS) where she holds a tenure researcher position. She has been working in the area of computational linguistics for the past 15 years. She served as Senior Area Chair (AC) for Lexical Semantics at EACL 2021, and as AC at ACL-IJCNLP 2021, ACL 2020, EMNLP 2021, EMNLP 2020, ACL 2019, EMNLP-IJCNLP 2019, *SEM 2019 and *SEM 2022. She was Program Chair for the 9th Joint Conference on Lexical and Computational Semantics (*SEM 2020), and co-chair of the SemEval Semantic Evaluation campaign from 2017 to 2019. She was organizer of the Deep Learning Inside Out (DeeLIO) workshop on Knowledge Extraction and Integration for Deep Learning Architectures. She served as elected member of the SIGLEX (ACL Special Interest Group on the Lexicon) Executive Board from 2013 to 2016, and on the SIGLEX Advisory Board since 2016. She serves as (Senior) Action Editor for ARR, and as a reviewer for the TACL journal. She has been the PI of the MULTISEM project on "Advanced models for multilingual semantic processing" funded by the French National Research Agency (ANR) (2016-2021), and is currently a Co-PI of the IARPA PAUSIT grant.
Chris Callison-Burch, University of Pennsylvania

Chris Callison-Burch is an associate professor of Computer and Information Science at the University of Pennsylvania. Before joining Penn, he was a research faculty member at the Center for Language and Speech Processing at Johns Hopkins University for 6 years. He served as the General Chair of the ACL 2017 conference, and the Program Co-Chair for the EMNLP 2015 conference. He was the Chair of the Executive Board of NAACL from 2011-2013, and the Secretary-Treasurer for SIGDAT from 2015-2017. He has served on the editorial boards of the journals Transactions of the ACL (TACL) and Computational Linguistics. He has more than 100 publications, which have been cited over 10,000 times. He is a Sloan Research Fellow, and he has received faculty research awards from Google, Microsoft, Amazon and Facebook in addition to funding from DARPA and the NSF. His research interests include natural language processing and crowdsourcing.

Towards Faithful Model Explanation in NLP: A Survey

Authors

Abstract

Author Biographies

Downloads

Published

Issue

Section

Make a Submission

Information

Announcements

EMNLP 2025 – CL deadlines for Qualifying Papers

Computational Linguistics - December 2025 51(1) has been published!