A Strategy to Generate Bilingual Paraphrases with Compositional Distributional Semantics

Authors

Abstract

This paper describes a compositional distributional method to generate bilingual paraphrases from monolingual corpora. Bilingual paraphrasing is modeled in the same way as contextualization of word meaning, but in a bilingual vector space. The contextualization of meaning is carried out by means of distributional composition within a structured  vector space with syntactic dependencies, while the bilingual space is created by means of transfer rules and a bilingual dictionary. A phrase in the source language, consisting of a \emph{head} and a \emph{dependent},  is paraphrased into the target language by selecting  both the nearest neighbor of the head given the dependent, and the nearest neighbor of the dependent given the head.
This process is expanded to larger phrases by means of incremental composition. Experiments were performed on English and Spanish monolingual corpora. A new dataset to evaluate strategies aimed at generating bilingual paraphrases in restricted syntactic domains has been created and released.

Published

2024-12-05

Issue

Section

Short paper