Hostility Detection Dataset in Hindi

A dataset of ~8200 annotated Hindi language online posts is presented, covering four hostility dimensions and including non-hostile labels, for the CONSTRAINT-2021 shared task.

Open

Preview
Year: 2020
Venue: arXiv 2020
ArXiv: arxiv.org/abs/2011.03588
Authors: 5
Hosting: Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text: arxiv.org/abs/2011.03588ARXIV-DEFAULT
TL;DR: Semantic Scholar

Attribution policy →

Abstract

In this paper, we present a novel hostility detection dataset in Hindi language. We collect and manually annotate 8200 online posts. The annotated dataset covers four hostility dimensions: fake news, hate speech, offensive, and defamation posts, along with a non-hostile label. The hostile posts are also considered for multi-label tags due to a significant overlap among the hostile classes. We release this dataset as part of the CONSTRAINT-2021 shared task on hostile post detection.

Authors

Tanmoy Chakraborty Md Shad Akhtar Asif Ekbal Amitava Das Mohit Bhardwaj