multilingual
Cohere
Cohere's multilingual eval bundle covering 101 languages - open-ended generations, discriminative tasks, and human-rated preference data.
Google DeepMind
Hand-translated 250-problem subset of GSM8K in 10 languages - a multilingual grade-school math benchmark.
OpenAI
MMLU translated into 14-26 languages (community variants exist); measures world knowledge and reasoning across non-English languages.
Cohere For AI's massively multilingual instruction dataset covering 65 languages, built by a 3,000-person open-science collaboration.
Massive multilingual task understanding multiple choice evaluation environment
Environment for single-turn tasks in OpenBench