OntoLearn Reloaded: A Graph-based Algorithm for Taxonomy Induction

Authors

  • Paola Velardi Sapienza University of Rome
  • Stefano Faralli Sapienza University of Rome
  • Roberto Navigli Sapienza University of Rome

Abstract

In 2004 we published on this journal an article describing OntoLearn, one of the first systems to automatically induce a taxonomy from documents and Web sites. Since then, OntoLearn has been an active area of research in our group and a reference work within the community. In this paper we describe our next-generation taxonomy learning methodology, that we name OntoLearn Reloaded. Unlike many taxonomy learning approaches in the literature, our novel algorithm learns both concepts and relations entirely from scratch via the automated extraction of terms, definitions and hypernyms. This results in a very dense, cyclic and possibly disconnected hypernym graph. The algorithm then induces a taxonomy from the graph via optimal branching and a novel weighting policy. Our experiments show that we obtain high-quality results, both when building brand-new taxonomies and when reconstructing  sub-hierarchies of existing taxonomies.

Published

2024-12-05

Issue

Section

Long paper