Large language models (LLMs) and small language models (SLMs) are being adopted at remarkable speed, although their safety still remains a serious concern. With the advent of multilingual S/LLMs, the question now becomes a matter of scale: can we expand multilingual safety evaluations of these models with the same velocity at which they are deployed? To this end, we introduce RTP-LX, a human-transcreated and human-annotated corpus of toxic prompts and outputs in 28 languages. RTP-LX follows participatory design practices, and a portion of the corpus is especially designed to detect culturally-specific toxic language. We evaluate 10 S/LLMs on their ability to detect toxic content in a culturally-sensitive, multilingual scenario. We find that, although they typically score acceptably in terms of accuracy, they have low agreement with human judges when scoring holistically the toxicity of a prompt; and have difficulty discerning harm in context-dependent scenarios, particularly with subtle-yet-harmful content (e.g. microaggressions, bias). We release this dataset to contribute to further reduce harmful uses of these models and improve their safe deployment.
RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?
RTP-LX, a multilingual human-annotated dataset, evaluates the toxicity detection capabilities of S/LLMs and highlights their limitations in understanding context-dependent and culturally-specific harm.
- Year
- 2024
- Venue
- arXiv 2024
- Authors
- 33
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2404.14397v2ARXIV-DEFAULT
- TL;DR
- Semantic Scholar
Abstract
Authors
33Si-Qing ChenMinghui ZhangAdrian de WynterJongho LeeIshaan WattsTua WongsangaroonsriNoura FarraNektar Ege AltıntoprakLena BaurSamantha ClaudetPavel GajdusekCan GörenQilong GuAnna KaminskaTomasz KaminskiRuby KuoAkiko KyubaKartik MathurPetter MerokIvana MilovanovićNani PaananenVesa-Matti PaananenAnna PavlenkoBruno Pereira VidalLuciano StrikaYueh TsaoDavide TurcatoOleksandr VakhnoJudit VelcsovAnna VickersStéphanie VisserHerdyan WidarmantoAndrey Zaikin