0

Text Segmentation as a Supervised Learning Task

A supervised learning approach to text segmentation using a large automatically labeled dataset from Wikipedia is introduced and shown to generalize well to unseen natural text.

Year
2018
Venue
text-segmentation-as-a-supervised-learning-1
Authors
5
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/1803.09337ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

Text segmentation, the task of dividing a document into contiguous segments based on its semantic structure, is a longstanding challenge in language understanding. Previous work on text segmentation focused on unsupervised methods such as clustering or graph search, due to the paucity in labeled data. In this work, we formulate text segmentation as a supervised learning problem, and present a large new dataset for text segmentation that is automatically extracted and labeled from Wikipedia. Moreover, we develop a segmentation model based on this dataset and show that it generalizes well to unseen natural text.

Authors

5