0

MediaSum: A Large-scale Media Interview Dataset for Dialogue Summarization

MediaSum, a large-scale media interview dataset, exhibits unique biases and enhances performance in dialogue summarization through transfer learning.

Year
2021
Venue
NAACL 2021 4
Authors
4
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2103.06410v2ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

MediaSum, a large-scale media interview dataset consisting of 463.6K transcripts with abstractive summaries. To create this dataset, we collect interview transcripts from NPR and CNN and employ the overview and topic descriptions as summaries. Compared with existing public corpora for dialogue summarization, our dataset is an order of magnitude larger and contains complex multi-party conversations from multiple domains. We conduct statistical analysis to demonstrate the unique positional bias exhibited in the transcripts of televised and radioed interviews. We also show that MediaSum can be used in transfer learning to improve a model's performance on other dialogue summarization tasks.

Authors

4