Sampling Tree Fragments from Forests

Authors

  • Tagyoung Chung University of Rochester
  • Licheng Fang University of Rochester
  • Daniel Gildea University of Rochester
  • Daniel Stefankovic University of Rochester

Abstract

We study the problem of sampling trees from forests,
in the setting where probabilities for each tree may be
be a function of arbitrarily large tree fragments.
This setting extends recent work for sampling to learn
Tree Substitution Grammars to the case where the
tree structure (TSG derived tree) is not fixed.
We develop an MCMC algorithm which corrects for
the bias introduced by unbalanced forests, and
present experiments using the algorithm to learn
Synchronous Context-Free Grammar rules for machine translation.
In this application, the forests being sampled represent the
set of Hiero-style rules that are consistent with fixed input
word-level alignments.  We demonstrate
equivalent machine translation performance to
standard techniques but with much smaller grammars.

Author Biography

  • Daniel Gildea, University of Rochester
    Assoc Prof, Computer Science

Published

2024-12-05

Issue

Section

Short paper