Aya Dataset
Fresh
Cohere For AI's massively multilingual instruction dataset covering 65 languages, built by a 3,000-person open-science collaboration.
- Type
- SFT Dataset
- Publisher
- Cohere
- Capabilities
- Instruction FollowingMultilingual
- Runtime
hf_parquet- License
- Apache-2.0
- Size
- 204k human-curated rows (Aya Dataset) + 513M templated (Aya Collection)
- Published
- May 2026
Cite
Notes
Only stored in your browser.
Lift evidence
3| Eval | Tools known to lift | Source paper |
|---|---|---|
| MGSM (Multilingual GSM8K) | Aya Dataset | - |
| Multilingual MMLU | Aya Dataset | - |
| Aya Evaluation Suite | Aya Dataset | - |
Models
Notable models trained on it
Aya-101Aya-23 (8B, 35B)Aya Expanse