For many (minority) languages, the resources needed to train large models are not available. We investigate the performance of zero-shot transfer learning with as little data as possible, and the influence of language similarity in this process. We retrain the lexical layers of four BERT-based models using data from two low-resource target language varieties, while the Transformer layers are independently fine-tuned on a POS-tagging task in the model's source language. By combining the new lexical layers and fine-tuned Transformer layers, we achieve high task performance for both target languages. With high language similarity, 10MB of data appears sufficient to achieve substantial monolingual transfer performance. Monolingual BERT-based models generally achieve higher downstream task performance after retraining the lexical layer than multilingual BERT, even when the target language is included in the multilingual model.
Adapting Monolingual Models: Data can be Scarce when Language Similarity is High
Zero-shot transfer learning with minimal data is effective for low-resource minority languages, especially when language similarity is high, and retraining lexical layers improves monolingual BERT performance over multilingual BERT.
- Year
- 2021
- Venue
- Findings (ACL) 2021 8
- Authors
- 4
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2105.02855v2ARXIV-DEFAULT
- TL;DR
- Semantic Scholar