We present RONEC - the Named Entity Corpus for the Romanian language. The corpus contains over 26000 entities in ~5000 annotated sentences, belonging to 16 distinct classes. The sentences have been extracted from a copy-right free newspaper, covering several styles. This corpus represents the first initiative in the Romanian language space specifically targeted for named entity recognition. It is available in BRAT and CoNLL-U Plus formats, and it is free to use and extend at github.com/dumitrescustefan/ronec .
Introducing RONEC -- the Romanian Named Entity Corpus
RONEC, a named entity corpus for Romanian, provides annotated sentences for named entity recognition in various styles.
- Year
- 2019
- Venue
- arXiv 2019
- Authors
- 2
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/1909.01247v2ARXIV-DEFAULT
- TL;DR
- Semantic Scholar