UltraFeedback
Fresh
OpenBMB's 64k-prompt preference dataset built with GPT-4 critiques across instruction-following, truthfulness, honesty, and helpfulness - the de facto open DPO baseline.
- Type
- Preference
- Publisher
- OpenBMB
- Capabilities
- HallucinationSafetyInstruction Following
- Runtime
hf_parquet- License
- MIT
- Size
- 64k prompts (~256k responses, ~340k pairs in binarized variant)
- Published
- May 2026
Cite
Notes
Only stored in your browser.
Lift evidence
3| Eval | Tools known to lift | Source paper |
|---|---|---|
| AlpacaEval | UltraFeedback | - |
| MT-Bench | UltraFeedback | - |
| Arena-Hard | UltraFeedback | - |
Models
Notable models trained on it
Zephyr-7B-betaStarling-7BNotusmany Llama-3 / Mistral DPO fine-tunes