Text Representations for Patent Classification

Authors

  • Eva D'hondt Radboud University Nijmegen
  • Suzan Verberne
  • Kees Koster
  • Lou Boves

Abstract

With the increasing rate of patent application filings around the world, optimizing automatedpatent classification is of rising economic importance. In this paper, we investigate how patentclassification can be improved by using different text representations for the patent documents.We compare the impact of adding statistical phrases (in the form of bigrams) and linguisticphrases (in two different dependency formats) to the standard bag-of-words text representation.The classification experiments reported in this paper were carried out with the LinguisticClassification System (LCS) on a subset of 532,264 English abstracts from the CLEF-IP 2010corpus. We find that the addition of phrases always results in a significant improvement overthe unigram baseline. The best results were achieved by extending unigrams with lemmatizedbigrams. We perform extensive analyses of the class models (a.k.a. class profiles) created by theclassifier in the LCS framework, to find out which types of phrases are most informative forpatent classification.

Author Biography

  • Eva D'hondt, Radboud University Nijmegen
    Department of Linguistics, PhD student

Published

2024-12-05

Issue

Section

Short paper