0

RuBQ: A Russian Dataset for Question Answering over Wikidata

A high-quality Russian knowledge base question answering dataset, RuBQ, is introduced, featuring machine translations, SPARQL queries, and entity linking.

Year
2020
Venue
arXiv 2020
Authors
2
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2005.10659ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

The paper presents RuBQ, the first Russian knowledge base question answering (KBQA) dataset. The high-quality dataset consists of 1,500 Russian questions of varying complexity, their English machine translations, SPARQL queries to Wikidata, reference answers, as well as a Wikidata sample of triples containing entities with Russian labels. The dataset creation started with a large collection of question-answer pairs from online quizzes. The data underwent automatic filtering, crowd-assisted entity linking, automatic generation of SPARQL queries, and their subsequent in-house verification.

Authors

2