Multiword Expression Processing: A Survey
Abstract
Multiword Expressions (MWEs) are a class of linguistic forms spanning conventional word boundaries that are both idiosyncratic in terms of behaviour and pervasive in terms of occurrence. The structure of linguistic processing tasks such as parsing and machine translation, that depend on the normally clear distinction between words and phrases, has to be re-thought to accommodate MWEs.
There is no shortage of proposed solutions. In fact, it is the emergence of solutions in the absence of guiding principles that motivates this survey whose aim is not only to provide a focused review that sheds light on MWE processing in general, but also to clarify the nature of interactions between MWE processing and downstream applications such as parsing and machine translation. Our scope is different from existing surveys and reviews. These either concentrate primarily on the linguistic characteristics of MWEs or focus only on a specific part of MWE processing and some of the interactions. In short, the big picture is missing from existing reviews on MWE processing without which it is difficult to compare individual solutions or to reveal that ostensibly different solutions might actually share similar characteristics. To overcome these shortcomings the authors of the present survey felt that it was necessary to create a framework within which both the problems and the different research contributions could be positioned.
The main contributions of the survey follow from the underlying characteristics of the framework: The first is a shared understanding of what is meant by the term “MWE processingâ€as well as a clear delineation of the two subtasks: discovery and identification. The second, an analysis of the interactions between MWE processing and applications-oriented linguistic processing. Finally, what we refer to as “orchestrationâ€: we found that many of the approaches adopted in recent literature could be usefully differentiated according to how MWE processing is timed with respect to the tasks associated with underlying linguistic processing. Hence the framework includes consideration of how orchestration choices affect the scope of an MWE-aware system.