Curing the SICK and other NLI maladies
Abstract
Against the backdrop of the ever-improving Natural Language Inference (NLI) models, recent efforts have focused
on the suitability of the current NLI datasets and on the feasibility of the NLI task as it is currently approached.
Many of the recent works have exposed the inherent human disagreements of the inference task and have proposed
a shift from categorical labels to human subjective probability assessments, capturing human uncertainty. In
this work, we show how neither the current task formulation nor the proposed uncertainty gradient are entirely
suitable for solving the NLI challenges. Instead, we propose an ordered sense space annotation, which distinguishes
between logical and common-sense inference. One end of the space captures non-sensical inferences, while the other
end represents strictly logical scenarios. In the middle of the space, we find a continuum of common-sense, i.e.,
the subjective and graded opinion of a “person on the street”. To arrive at the proposed annotation scheme, we
perform a careful investigation of the SICK corpus and we create a taxonomy of annotation issues and guidelines.
We re-annotate the corpus with the proposed annotation scheme and perform a thorough evaluation of the scheme
by training and testing BERT models within various settings. Our work shows the efficiency and benefits of the
proposed mechanism and opens the way for a careful NLI task refinement.