Towards Automatic Error Analysis of Translation Output
Abstract
Evaluation and error analysis of machine translation output areimportant but difficult tasks. In this article, we propose a framework
for automatic error analysis and classification based on the
identification of actual erroneous words using the algorithms for
computation of Word Error Rate (WER) and Position independent word
Error Rate (PER) which is just a very first step towards development of automatic evaluation measures which provide more specific information of certain translation problems. The proposed approach enables the use of various types of linguistic knowledge in order to classify translation errors
in many different ways. This work focuses on one possible set-up, namely on five error categories: inflectional errors, errors due to wrong word order,
missing words, extra words and incorrect lexical choices. For each of
the categories, we analyse the contribution of various POS classes. We
compared the results of automatic error analysis with the results of
human error analysis in order to investigate two possible
applications: estimating the contribution of each error type in a
given translation output in order to identify the main sources of
errors for a given translation system, and comparing different
translation outputs using the introduced error categories in order to
obtain more information about advantages and disadvantages of
different systems and possibilites for improvements, as well as about
advantages and disadvantages of applied methods for improvements. We used Arabic--English Newswire and Broadcast News and Chinese--English Newswire outputs created in the framework of the GALE project,
several Spanish and English European Parliament outputs
generated during the TC-Star project, and three German--English outputs
generated in the framework of the fourth Machine Translation Workshop
(WMT). We show that our results correlate very well with the results
of a human error analysis, and that all our metrics except the extra
words reflect well the differences between different versions of the same
translation system as well as the differences between different translation
systems.