0

Encouraging Paragraph Embeddings to Remember Sentence Identity Improves Classification

A sentence content probe improves paragraph embedding by enhancing classification accuracy, training speed, and generalization over reconstruction-based methods.

Year
2019
Venue
encouraging-paragraph-embeddings-to-remember-1
Authors
2
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/1906.03656ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

While paragraph embedding models are remarkably effective for downstream classification tasks, what they learn and encode into a single vector remains opaque. In this paper, we investigate a state-of-the-art paragraph embedding method proposed by Zhang et al. (2017) and discover that it cannot reliably tell whether a given sentence occurs in the input paragraph or not. We formulate a sentence content task to probe for this basic linguistic property and find that even a much simpler bag-of-words method has no trouble solving it. This result motivates us to replace the reconstruction-based objective of Zhang et al. (2017) with our sentence content probe objective in a semi-supervised setting. Despite its simplicity, our objective improves over paragraph reconstruction in terms of (1) downstream classification accuracies on benchmark datasets, (2) faster training, and (3) better generalization ability.

Authors

2