How to distinguish languages and dialects

Authors

  • Søren Wichmann Leiden University, Kazan Federal University, Beijing Language University

Abstract

The terms ‘language’ and ‘dialect’ are ingrained in our usage but linguists nevertheless tend to agree that it is impossible to apply a non-arbitrary distinction such that two speech varieties can be identified as either distinct languages or two dialects of one and the same language. A database of lexical information for more than 7,000 speech varieties, however, unveils a strong tendency for linguistic distances to be bimodally distributed. For a given language group the linguistic distances pertaining to either cluster can be teased apart using the k-means technique and the threshold separating them identified. Thresholds are remarkably consistent across dataset, qualifying their mean as a universal criterion for distinguishing between language and dialect pairs. A 95% confidence interval around the mean of the thresholds identified translates into a temporal distance3 of around a millennium (963-1242 years).

Author Biography

  • Søren Wichmann, Leiden University, Kazan Federal University, Beijing Language University
    Leiden University Centre for Linguistics, Researcher

Published

2024-12-05

Issue

Section

Squibs and Discussions