How Much Does Lookahead Matter for Disambiguation? Partial Arabic Diacritization Case Study

Authors

Abstract

We suggest a model for partial diacritization of deep orthographies. We focus on Arabic, where the optional indication of selected vowels by means of diacritics can improve readability. Our partial diacritizer restores short vowels only when they contribute to the ease of understandability of a given running text. The idea is to mark those instances that require lookahead to disambiguate. Two independent neural networks are employed, one that takes the entire sentence as input, and another that considers only the text that has been read so far. Partial diacritization is then achieved by keeping those vowels on which the two networks disagree, preferring the reading based on consideration of the whole sentence. For evaluation, we prepared a new dataset of Arabic texts with both full and partial vowelization.

Author Biographies

  • Saeed Esmail, Tel Aviv University
    Graduate Student, Computer Science
  • Kfir Bar, The College of Management, Israel
    Senior Lecturer, Computer Science

Published

2024-11-15