Identification of Multi-word Expressions by Combining Multiple Linguistic Information Sources

Yulia Tsvetkov; Shuly Wintner

Authors

Yulia Tsvetkov Carnegie Mellon University
Shuly Wintner Department of Computer Science, University of Haifa

Abstract

We propose a framework for employing multiple sources of linguistic information in the task of identifying multi-word expressions in natural language texts. We define various linguistically-motivated classification features and introduce novel ways for computing them. We then manually define interrelationships among the features, and express them in a Bayesian network. The result is a powerful classifier that can identify multi-word expressions of various types and syntactic constructions in text corpora. Our methodology is unsupervised and language-independent; it requires relatively few language resources and is thus suitable for a large number of languages. We report results on English, French, and Hebrew, and demonstrate a significant improvement in identification accuracy, compared with less sophisticated baselines.

Identification of Multi-word Expressions by Combining Multiple Linguistic Information Sources

Authors

Abstract

Published

Issue

Section

Make a Submission

Information

Announcements

EACL 2027 - CL deadlines for Qualifying Papers

Special Issue on the Ethics of NLP and CL in Computational Linguistics