This paper addresses the problem of sentiment analysis for Jopara, a code-switching language between Guarani and Spanish. We first collect a corpus of Guarani-dominant tweets and discuss on the difficulties of finding quality data for even relatively easy-to-annotate tasks, such as sentiment analysis. Then, we train a set of neural models, including pre-trained language models, and explore whether they perform better than traditional machine learning ones in this low-resource setup. Transformer architectures obtain the best results, despite not considering Guarani during pre-training, but traditional machine learning models perform close due to the low-resource nature of the problem.
On the logistical difficulties and findings of Jopara Sentiment Analysis
Transformer architectures outperform traditional machine learning models in sentiment analysis for Jopara, a code-switching language between Guarani and Spanish, despite lacking Guarani-specific pre-training data.
- Year
- 2021
- Venue
- NAACL (CALCS) 2021 6
- Authors
- 3
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2105.02947v2ARXIV-DEFAULT
- TL;DR
- Semantic Scholar