Pre-trained Summarization Distillation

Recent state-of-the-art approaches to summarization utilize large pre-trained Transformer models. Distilling these models to smaller student models has become critically important for practical use; however there are many different distillation methods proposed by the NLP…

Open

Preview
Year: 2020
Venue: arXiv 2020
ArXiv: arxiv.org/abs/2010.13002
Authors: 2
Hosting: External sourcelicense unknown

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text: arxiv.org/abs/2010.13002v2
TL;DR: Semantic Scholar

Attribution policy →

Authors

Sam Shleifer Alexander M. Rush