0

COMMENTATOR: A Code-mixed Multilingual Text Annotation Framework

As the NLP community increasingly addresses challenges associated with multilingualism, robust annotation tools are essential to handle multilingual datasets efficiently.

Preview
First page of COMMENTATOR: A Code-mixed Multilingual Text Annotation Framework
Year
2024
Venue
arXiv 2024
Authors
5
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2408.03125ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

As the NLP community increasingly addresses challenges associated with multilingualism, robust annotation tools are essential to handle multilingual datasets efficiently. In this paper, we introduce a code-mixed multilingual text annotation framework, COMMENTATOR, specifically designed for annotating code-mixed text. The tool demonstrates its effectiveness in token-level and sentence-level language annotation tasks for Hinglish text. We perform robust qualitative human-based evaluations to showcase COMMENTATOR led to 5x faster annotations than the best baseline. Our code is publicly available at \url{https://github.com/lingo-iitgn/commentator}. The demonstration video is available at \url{https://bit.ly/commentator_video}.

Authors

5