IFBench
Frontier
Instruction-following benchmark measuring adherence to multi-step constraints.
- Publisher
- Allen Institute for AI
- Published
- Jun 2025
- Canonical
- github.com/allenai/IFBench
Cite
Notes
Only stored in your browser.
Top score 80.5% by Qwen3.7 Max - 324 models reporting (65 frontier)
Score history
324Top models
324Related tools
3Implementations, trainers, datasets and scaffolds linked to this eval.
FAQ
- What is IFBench?
- Instruction-following benchmark measuring adherence to multi-step constraints.
- What is the current top score on IFBench?
- The top reported score is 80.5% by Qwen3.7 Max, across 324 models reporting (65 from frontier labs).
- How can a model improve its IFBench score?
- Tools linked to IFBench on Sophon include Ifbench RL Env (Community), Ifbench RL Env (Prime Intellect), Ifbench RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
