A Graph Based Framework for Structured Prediction Tasks in Sanskrit

Authors

  • Amrith Krishna Indian Institute of Technology Kharagpur
  • Bishal Santra Indian Institute of Technology Kharagpur
  • Pavankumar Satuluri Chinmaya Vishwavidyapeeth
  • Ashim Gupta Indian Institute of Technology Kharagpur
  • Pawan Goyal Indian Institute of Technology Kharagpur

Abstract

We propose a framework using Energy Based Models for multiple structured prediction tasks in Sanskrit. Ours is an arc-factored model, similar to the graph based parsing approaches, and we consider the tasks of word-segmentation, morphological parsing, dependency parsing, syntactic linearisation and a prosody level task we introduce in this work, poetry linearisation. Ours is a search based structured prediction framework, which expects a graph as input, where relevant linguistic information is encoded in the nodes, and the edges are then used to indicate the association between these nodes. Typically the state of the art models for morphosyntactic tasks in morphologically rich languages still rely on hand-crafted features for their performance. But here, we automate the learning of the feature function. The feature function so learnt along with the search space we construct, encode relevant linguistic information for the tasks we consider. This enables us to substantially reduce the training data requirements to as low as 10 % as compared to the data requirements for the neural state of the art models. While the learning procedure, in principle is language agnostic, it enables to incorporate language specific constraints to prune the search space and to filter the candidates during inference. We observe significant improvements in the syntax level tasks due to incorporating these language specific constraints. In all the tasks we discuss, we either achieve state of the art results or ours is the only data driven solution for those tasks.

Author Biographies

  • Amrith Krishna, Indian Institute of Technology Kharagpur
    Phd Student, Dept. of CSE, IIT Kharagpur
  • Bishal Santra, Indian Institute of Technology Kharagpur
    Phd Student, Dept. of CSE, IIT Kharagpur
  • Pawan Goyal, Indian Institute of Technology Kharagpur
    Assistant Professor, Dept. of CSE, IIT Kharagpur

Published

2024-12-05

Issue

Section

Long paper