OntoLearn Reloaded: A Graph-based Algorithm for Taxonomy Induction
Abstract
In 2004 we published on this journal an article describing OntoLearn, one of the first systems to automatically induce a taxonomy from documents and Web sites. Since then, OntoLearn has been an active area of research in our group and a reference work within the community. In this paper we describe our next-generation taxonomy learning methodology, that we name OntoLearn Reloaded. Unlike many taxonomy learning approaches in the literature, our novel algorithm learns both concepts and relations entirely from scratch via the automated extraction of terms, definitions and hypernyms. This results in a very dense, cyclic and possibly disconnected hypernym graph. The algorithm then induces a taxonomy from the graph via optimal branching and a novel weighting policy. Our experiments show that we obtain high-quality results, both when building brand-new taxonomies and when reconstructing sub-hierarchies of existing taxonomies.Published
2024-12-05
Issue
Section
Long Paper