Defensive Dual Masking for Robust Adversarial Defense

Authors

  • Wangli Yang University of Wollongong
  • jie Yang University of Wollongong
  • Yi Guo Western Sydney University
  • Johan Barthelemy NVIDIA

Keywords:

Adversarial Detection, Defensive Masking, Model Robustness, Adversarial Attacks, Adversarial defense

Abstract

Adversarial defenses for textual data have gained considerable attention in recent years due to the increasing vulnerability of Natural Language Processing (NLP) models to adversarial attacks. These attacks exploit subtle perturbations in input text to deceive models, posing significant challenges to model robustness and reliability. This paper introduces Defensive Dual Masking (our method), a simple yet effective algorithm that employs two unique masking strategies to mitigate adversarial threats. Specifically, during training, [MASK] tokens are directly inserted into input samples to prepare the model for handling perturbed inputs. At inference time, suspicious tokens are identified and strategically replaced with [MASK] tokens, effectively neutralizing perturbations while preserving core semantics of the input text. The theoretical foundation of our method demonstrates how the proposed masking strategies enhance the model capacity to mitigate adversarial attacks. Empirical evaluations based on four benchmark datasets and four adversarial attacks consistently demonstrate that our method outperforms state-of-the-art defense techniques, achieving superior robustness and substantial improvements in model accuracy. Furthermore, our method seamlessly integrates with Large Language Models (LLMs), enhancing their resilience to adversarial attacks and providing a scalable defense solution for large-scale NLP applications.

Published

2026-06-27