Position Information in Transformers: An Overview

Authors

  • Philipp Dufter Ludwig-Maximilians-Universität München

Abstract

Transformers are arguably the main workhorse in recent Natural Language Processing research. By definition a Transformer is invariant with respect to rorderings of the input. However, language is inherently sequential and word order is essential to the semantics and syntax of an utterance. In spite of its importance, the incorporation of position information into Transformer models has rarely been the main focus of previous work. As a result, it is nontrivial to thoroughly assess the individual properties of position models, let alone study their differences. We suggest to provide an exhaustive overview and comparison of existing work on encoding position information in Transformer. The objectives of this survey are to i) showcase that modeling position information in Transformer is a vibrant and extensive research area; ii) enable the reader to compare existing methods by providing a unified notation and meaningful clustering; iii) indicate what characteristics of an application should be taken into account when selecting a position encoding; iv) provide stimuli for future research.

Author Biography

  • Philipp Dufter, Ludwig-Maximilians-Universität München
    PhD Student at Center for Information- and Language Processing at University of Munich.

Published

2024-12-05

Issue

Section

Survey article