A central question in natural language understanding (NLU) research is whether high performance demonstrates the models' strong reasoning capabilities. We present an extensive series of controlled experiments where pre-trained language models are exposed to data that have undergone specific corruption transformations. These involve removing instances of specific word classes and often lead to non-sensical sentences. Our results show that performance remains high on most GLUE tasks when the models are fine-tuned or tested on corrupted data, suggesting that they leverage other cues for prediction even in non-sensical contexts. Our proposed data transformations can be used to assess the extent to which a specific dataset constitutes a proper testbed for evaluating models' language understanding capabilities.
How Does Data Corruption Affect Natural Language Understanding Models? A Study on GLUE datasets
Language models maintain high performance on corrupted data, indicating they rely on cues beyond syntax and semantics for prediction.
- Year
- 2022
- Venue
- *SEM (NAACL) 2022 7
- Authors
- 4
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2201.04467v2ARXIV-DEFAULT
- TL;DR
- Semantic Scholar