Detecting Local Insights from Global Labels: Supervised & Zero-Shot Sequence Labeling via a Convolutional Decomposition
Abstract
Detecting, constraining, manipulating, and reasoning over features in the high-dimensional space that is language is the primary concern of modern computational linguistics, and a key focus of AI, more generally. We propose a general framework for addressing the first two of these four facets, coupling sharp feature detection with a matching method for introspecting inference.Â
Â
Specifically, we present and analyze binary labeling via a convolutional decomposition (BLADE), a sequence labeling approach based on a decomposition of the filter-ngram interactions of a convolutional neural network and a linear layer. BLADE can be viewed as a maxpool attention-style mechanism on the final layer of a network, and it enables flexibility in producing predictions at---and defining loss functions for---varying label granularities, from the fully-supervised sequence labeling setting to the challenging zero-shot sequence labeling setting, in which we seek token-level predictions but only have access to labels at the document- or sentence-level for training. Importantly, BLADE enables a matching method, exemplar auditing, useful for analyzing the model and data, and empirically, as part of an inference-time decision rule. This introspection method provides a means, in some settings, of updating the model (via a database) without explicit re-training, opening the possibility for end-users to make local updates, or for annotators to progressively add fine-grained labels (or other meta-data).
Â
We assess this framework---and suitability limitations---on a series of binary classification tasks and at varying label resolutions. We demonstrate effectiveness for fully-supervised and zero-shot grammatical error detection and demonstrate updating the exemplar database with out-of-domain data without updating model parameters. We illustrate the text analytic utility of the approach on sentiment analysis, revealing distinctive features in counterfactually-augmented and contrast set re-annotations, and we consider the potential for exemplar auditing as an alternative to such targeted re-annotating. We also show that this strong sequence model can be used to guide synthetic text generation, which has concomitant implications for using such detection models for identifying synthetic data. In supplementary material, we provide qualitative evidence that the approach can be a useful tool for document analysis and summarization, further demonstrating that this framework is useful across NLP tasks, from low- to high- resource settings.