Probabilistic Distributional Semantics with Latent Variable Models
Abstract
We describe a probabilistic framework for acquiring selectional preferences of linguistic predicates and for using theacquired representations to model the effects of context on word meaning. Our framework uses Bayesian latent-variable
models inspired by, and extending, the well-known Latent Dirichlet allocation (LDA) model of topical structure in documents; when applied to predicate-argument data, topic models automatically induce semantic classes of arguments and assign each predicate a distribution over those classes. We consider LDA and a number of extensions to the model and evaluate them on a variety of semantic prediction tasks, demonstrating that our approach attains state-of-the-art performance . More generally, we argue that probabilistic methods provide an effective and flexible methodology for distributional semantics.