Authorship Attribution with Topic Models

Authors

  • Yanir Seroussi
  • Ingrid Zukerman
  • Fabian Bohnert

Abstract

Authorship attribution deals with identifying the authors of anonymous texts. Traditionally, research in this field has focused on formal texts, such as essays and novels, but recently more attention has been given to texts generated by online users, such as emails and blogs. Authorship attribution of such online texts is a more challenging task than traditional authorship attribution, because such texts tend to be short and informal, and the number of candidate authors is often larger than in traditional settings. We address this challenge by employing topic models to obtain author representations. In addition to exploring novel ways of applying two popular topic models to this task, we develop a new model that projects authors and documents to two disjoint topic spaces. Employing our model in authorship attribution yields state-of-the-art performance on several datasets, containing either formal texts written by a few authors or informal texts generated by tens to thousands of online users. We also present experimental results that demonstrate the applicability of topical author representations to two other problems: inferring the sentiment polarity of texts and predicting the ratings that users would give to items such as movies.

Published

2024-12-05

Issue

Section

Long paper