This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users). We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark. We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024. The v1.0 benchmark will provide meaningful insights into the safety of AI systems. However, the v0.5 benchmark should not be used to assess the safety of AI systems. We have sought to fully document the limitations, flaws, and challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes (1) a principled approach to specifying and constructing the benchmark, which comprises use cases, types of systems under test (SUTs), language and context, personas, tests, and test items; (2) a taxonomy of 13 hazard categories with definitions and subcategories; (3) tests for seven of the hazard categories, each comprising a unique set of test items, i.e., prompts. There are 43,090 test items in total, which we created with templates; (4) a grading system for AI systems against the benchmark; (5) an openly available platform, and downloadable tool, called ModelBench that can be used to evaluate the safety of AI systems on the benchmark; (6) an example evaluation report which benchmarks the performance of over a dozen openly available chat-tuned language models; (7) a test specification for the benchmark.
Introducing v0.5 of the AI Safety Benchmark from MLCommons
The AI Safety Benchmark v0.5, developed by MLCommons, assesses the safety risks of chat-tuned language models using a taxonomy of 13 hazard categories and includes 43,090 test items.
- Year
- 2024
- Venue
- arXiv 2024
- Authors
- 100
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2404.12241v2ARXIV-DEFAULT
- TL;DR
- Semantic Scholar
Abstract
Authors
100Percy LiangJiacheng ZhuLeon DerczynskiBo LiZacharie Delpierre CoudertXianjun YangBrian FullerAdina WilliamsBesmira NushiEileen Peters LongAlicia ParrishXudong ShenCanyu ChenBhavya KailkhuraBertie VidgenJoaquin VanschorenVithursan ThangarasaMax BartoloNino ScherrerPatrick SchramowskiAdarsh AgrawalAhmed M. AhmedVictor AkinwandeNamir Al-NuaimiNajla AlfarajElie AlhajjarLora AroyoTrupti BavalattiBorhane Blili-HamelinKurt BollackerRishi BomassaniMarisa Ferrara BostonSiméon CamposKal ChakraCody ColemanDebojyoti DuttaIan EisenbergJames EzickHeather FraseRam GandikotaAgasthya GangavarapuAnanya GangavarapuJames GealyRajat GhoshJames GoelUsman GoharSujata GoswamiScott A. HaleWiebke HutiriJoseph Marvin ImperialSurgan JandialNick JuddFelix Juefei-XuFoutse khomhHannah Rose KirkKevin KlymanChris KnotzMichael KuchnikShachi H. KumarSrijan KumarChris LengerichZeyi LiaoVictor LuSarah LugerYifan MaiPriyanka Mary MammenKelvin ManyekiSean McGregorVirendra MehtaShafee MohammedEmanuel MossLama NachmanDinesh Jinenhally NagannaAmin NikanjamLuis OalaIftach OrrCigdem PatlakWilliam PietriForough Poursabzi-SangdehEleonora PresaniFabrizio PulettiPaul RöttgerSaurav SahayTim SantosAlice Schoenauer SebagAbolfazl ShahbaziVin SharmaVamsi SistlaLeonard TangDavide TestuggineElizabeth Anne WatkinsRebecca WeissChris WeltyTyler WilbersCarole-Jean WuPoonam YadavYi ZengWenhui ZhangFedor ZhdanovPeter Mattson