UltraFeedback

Fresh

OpenBMB's 64k-prompt preference dataset built with GPT-4 critiques across instruction-following, truthfulness, honesty, and helpfulness - the de facto open DPO baseline.

Open

Type: Preference
Publisher: OpenBMB
Capabilities: Instruction Following Hallucination Safety
Runtime: hf_parquet
License: MIT
Size: 64k prompts (~256k responses, ~340k pairs in binarized variant)
Published: May 2026
Canonical: huggingface.co/datasets/openbmb/UltraFeedback

Cite

Notes

Only stored in your browser.

Lift evidence

Eval	Tools known to lift	Source paper
AlpacaEval	UltraFeedback	-
MT-Bench	UltraFeedback	-
Arena-Hard	UltraFeedback	-

Models

Notable models trained on it

Zephyr-7B-betaStarling-7BNotusmany Llama-3 / Mistral DPO fine-tunes

Papers

introducesUltraFeedback: Boosting Language Models with High-quality Feedback

Contributors

Ganqu Cui Lifan Yuan