Learning Representations for Weakly Supervised Natural Language Processing Tasks

Authors

  • Fei Huang Temple University
  • Arun Ahuja Northwestern University
  • Doug Downey Northwestern University
  • Yi Yang Northwestern University
  • Yuhong Guo Temple University
  • Alexander Yates Temple University

Abstract

Finding the right representations for words is critical for building accurate NLP systems when domain-specific labeled data for the task is scarce.  This paper investigates novel techniques for extracting features from ngram models, Hidden Markov Models, and other statistical language models, including a novel Partial Lattice Markov Random Field model.  Experiments on part-of-speech tagging and information extraction, among other tasks, indicate that features taken from statistical language models, in combination with more traditional features, outperform traditional representations alone, and that graphical model representations outperform ngram models, especially on sparse and polysemous words.

Published

2024-12-05

Issue

Section

Long paper