We introduce XED, a multilingual fine-grained emotion dataset. The dataset consists of human-annotated Finnish (25k) and English sentences (30k), as well as projected annotations for 30 additional languages, providing new resources for many low-resource languages. We use Plutchik's core emotions to annotate the dataset with the addition of neutral to create a multilabel multiclass dataset. The dataset is carefully evaluated using language-specific BERT models and SVMs to show that XED performs on par with other similar datasets and is therefore a useful tool for sentiment analysis and emotion detection.
XED: A Multilingual Dataset for Sentiment Analysis and Emotion Detection
XED, a multilingual emotion dataset with human annotations and projected labels, is evaluated using language-specific BERT models and SVMs, showing its utility for sentiment analysis and emotion detection.
- Year
- 2020
- Venue
- COLING 2020 8
- Authors
- 4
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2011.01612v2ARXIV-DEFAULT
- TL;DR
- Semantic Scholar