Latent Trees for Coreference Resolution
Abstract
We describe a structure learning system for unrestricted coreference resolution that explores two key modeling techniques: latent coreference trees and automatic entropy guided feature induction. The latent tree modeling turns the learning problem computationally feasible, since it incorporates a meaningful hidden structure. Additionally, using an automatic feature induction method, we are able to efficiently build enhanced nonlinear models which use linear model learning algorithms. We present empirical results that highlight the contribution of each modeling technique used in the proposed system. Empirical evaluation is performed on the multilingual unrestricted CoNLL-2012 Shared Task datasets, which comprise three languages: Arabic, Chinese and English. We apply the same system to all languages, except for minor adaptations on some language dependent features such as nested mentions and specific static pronoun lists. A previous version of this system has been submitted to the CoNLL-2012 Shared Task closedtrack, achieving an official score of 58.69, the best one among all the competitors. The unique enhancement added to the current system version is the inclusion of candidate arcs linking nested mentions for the Chinese language. By including such arcs, the score increases by almost 4.5 points for that language. The current system shows a score of 60.15, which corresponds to a 3.5% error reduction and also the best performing system for each of the three languages.