Sampling Tree Fragments from Forests
Abstract
We study the problem of sampling trees from forests,in the setting where probabilities for each tree may be
be a function of arbitrarily large tree fragments.
This setting extends recent work for sampling to learn
Tree Substitution Grammars to the case where the
tree structure (TSG derived tree) is not fixed.
We develop an MCMC algorithm which corrects for
the bias introduced by unbalanced forests, and
present experiments using the algorithm to learn
Synchronous Context-Free Grammar rules for machine translation.
In this application, the forests being sampled represent the
set of Hiero-style rules that are consistent with fixed input
word-level alignments. We demonstrate
equivalent machine translation performance to
standard techniques but with much smaller grammars.