0

IFBench

Frontier

Instruction-following benchmark measuring adherence to multi-step constraints.

Published
Jun 2025

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores
AA
Attribution policy →

Top score 80.5% by Qwen3.7 Max - 324 models reporting (65 frontier)

Score history

324
0%25%50%75%100%Sep 23Apr 24Nov 24Jun 25Jan 26Mistral 7B InstructPhi-4 Mini InstructLlama 3.1 Instruct 405BClaude 3.5 Haikuo3GPT-5 CodexGPT-5.2-CodexNemotron Cascade 2 30B A3BQwen3.7 Max

Top models

324
IFBenchBar chart with 21 bars. Highest value: Qwen3.7 Max at 80.5.
21 models

Related tools

3
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

FAQ

What is IFBench?
Instruction-following benchmark measuring adherence to multi-step constraints.
What is the current top score on IFBench?
The top reported score is 80.5% by Qwen3.7 Max, across 324 models reporting (65 from frontier labs).
How can a model improve its IFBench score?
Tools linked to IFBench on Sophon include Ifbench RL Env (Community), Ifbench RL Env (Prime Intellect), Ifbench RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.