0

Aya Dataset

Fresh

Cohere For AI's massively multilingual instruction dataset covering 65 languages, built by a 3,000-person open-science collaboration.

Type
SFT Dataset
Publisher
Cohere
Runtime
hf_parquet
License
Apache-2.0
Size
204k human-curated rows (Aya Dataset) + 513M templated (Aya Collection)
Published
May 2026

Cite

Notes

Only stored in your browser.

Lift evidence

3

Models

Notable models trained on it

Aya-101Aya-23 (8B, 35B)Aya Expanse

Papers

1

Contributors

2