How to distinguish languages and dialects
Abstract
The terms ‘language’ and ‘dialect’ are ingrained in our usage but linguists nevertheless tend to agree that it is impossible to apply a non-arbitrary distinction such that two speech varieties can be identified as either distinct languages or two dialects of one and the same language. A database of lexical information for more than 7,000 speech varieties, however, unveils a strong tendency for linguistic distances to be bimodally distributed. For a given language group the linguistic distances pertaining to either cluster can be teased apart using the k-means technique and the threshold separating them identified. Thresholds are remarkably consistent across dataset, qualifying their mean as a universal criterion for distinguishing between language and dialect pairs. A 95% confidence interval around the mean of the thresholds identified translates into a temporal distance3 of around a millennium (963-1242 years).Downloads
Published
2024-12-05
Issue
Section
Squibs and Discussions