Anthropocentric bias in language model evaluation

Authors

  • Charles Rathkopf Forschungszentrum Jülich
  • Raphaël Millière

Keywords:

language models, cognitive evaluation, anthropocentric bias, metalinguistic prompting , grammaticality judgment , auxiliary oversight, mechanistic chauvinism, test-time scaling, latent competence

Abstract

Evaluating the cognitive capacities of large language models (LLMs) requires overcoming
not only anthropomorphic but also anthropocentric biases. This article identifies two types of
anthropocentric bias that have been neglected: overlooking how auxiliary factors can impede
LLM performance despite competence (auxiliary oversight), and dismissing LLM mechanistic
strategies that differ from those of humans as not genuinely competent (mechanistic chauvinism).
Mitigating these biases requires an empirical, iterative approach to mapping cognitive tasks to
LLM-specific capacities and mechanisms, achieved by supplementing behavioral experiments
with mechanistic studies.

Published

2026-06-27

Issue

Section

Squibs and Discussions