The paper presents RuBQ, the first Russian knowledge base question answering (KBQA) dataset. The high-quality dataset consists of 1,500 Russian questions of varying complexity, their English machine translations, SPARQL queries to Wikidata, reference answers, as well as a Wikidata sample of triples containing entities with Russian labels. The dataset creation started with a large collection of question-answer pairs from online quizzes. The data underwent automatic filtering, crowd-assisted entity linking, automatic generation of SPARQL queries, and their subsequent in-house verification.
RuBQ: A Russian Dataset for Question Answering over Wikidata
A high-quality Russian knowledge base question answering dataset, RuBQ, is introduced, featuring machine translations, SPARQL queries, and entity linking.
- Year
- 2020
- Venue
- arXiv 2020
- Authors
- 2
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2005.10659ARXIV-DEFAULT
- TL;DR
- Semantic Scholar