Recently, the development of pre-trained language models has brought natural language processing (NLP) tasks to the new state-of-the-art. In this paper we explore the efficiency of various pre-trained language models. We pre-train a list of transformer-based models with the same amount of text and the same training steps. The experimental results shows that the most improvement upon the origin BERT is adding the RNN-layer to capture more contextual information for short text understanding. But the conclusion is: There are no remarkable improvement for short text understanding for similar BERT structures. Data-centric method[12] can achieve better performance.
A Comprehensive Comparison of Pre-training Language Models
In a pre-training comparison of transformer-based models, adding RNN layers provided minimal improvement over BERT for short text understanding, highlighting the effectiveness of data-centric approaches.
- Year
- 2021
- Venue
- arXiv 2021
- Authors
- 1
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2106.11483v9ARXIV-DEFAULT
- TL;DR
- Semantic Scholar