Applications of Lexicographic Semirings to Problems in Speech and Language Processing

Authors

Richard Sproat Google, Inc
Mahsa Yarmohammadi Center for Spoken Language Understanding, Oregon Health & Science University
Izhak Shafran Center for Spoken Language Understanding, Oregon Health & Science University
Brian Roark Center for Spoken Language Understanding, Oregon Health & Science University

This paper explores lexicographic semirings and their application to problems in speech and

language processing. Specifically, we present two instantiations of binary lexicographic semirings,

one involving a pair of tropical weights, and the other a tropical weight paired with a

novel string semiring we term the categorial semiring. The first of these is used to yield an

exact encoding of backoff models with epsilon transitions. This lexicographic language model

semiring allows for off-line optimization of exact models represented as large weighted finitestate

transducers in contrast to implicit (on-line) failure transition representations. We present

empirical results demonstrating that, even in simple intersection scenarios amenable to the use

of failure transitions, the use of the more powerful lexicographic semiring is competitive in terms

of time of intersection. The second of these lexicographic semirings is applied to the problem

of extracting, from a lattice of word sequences tagged for part of speech, only the single bestscoring

part of speech tagging for each word sequence. We do this by incorporating the tags as a

categorial weight in the second component of a <Tropical,Categorial> lexicographic semiring,

determinizing the resulting word lattice acceptor in that semiring, and then mapping the tags

back as output labels of the word lattice transducer. We compare our approach to a competing

method due to Povey et al. (2012).