Can a Large Language Model Replace Humans at Rating Lexical Semantic Relations Strength?

André Fernandes dos Santos; José Paulo Leal

Authors

André Fernandes dos Santos CRACS & INESC Tec LA / Faculty of Sciences, University of Porto https://orcid.org/0000-0001-6410-9740
José Paulo Leal CRACS & INESC Tec LA / Faculty of Sciences, University of Porto https://orcid.org/0000-0002-8409-0300

Keywords:

Semantic Measures, Semantic Similarity, Semantic Relatedness, Large Language Models (LLMs) , Semantic Relations Datasets

Abstract

This paper investigates the ability of large language models (LLMs) to evaluate semantic relations between word pairs by examining their alignment with human-generated semantic ratings. Semantic relations represent the degree of connection (e.g., relatedness or similarity) between linguistic elements and are traditionally validated against human-annotated datasets. Due to the challenges of building such datasets and recent progress in LLMs' capacity to model human-like understanding, we explore whether LLMs can serve as reliable substitutes for traditional human ratings.

We conducted experiments using multiple LLMs from OpenAI, Google, Mistral, and Anthropic, evaluating their performance across diverse English and Portuguese semantic relations datasets. We included in the analysis PAP900, a recently published dataset of semantic relations in Portuguese, to examine the influence of prior exposure to the dataset on LLM training.

The results show that the LLM predictions correlate strongly with human ratings. The findings reveal the potential of LLMs to supplement or replace traditional semantic measure algorithms and crowd-sourced human annotations in semantic tasks.

Can a Large Language Model Replace Humans at Rating Lexical Semantic Relations Strength?

Authors

Keywords:

Abstract

Downloads

Published

Issue

Section

Make a Submission

Information

Announcements

Special Issue on the Ethics of NLP and CL in Computational Linguistics

EMNLP 2026 – CL deadlines for Qualifying Papers