IFEval
Benchmark website →Instruction-Following Eval measures how well models follow explicit, verifiable formatting and content constraints in instructions.
About this test
- What it measures
- Instruction following accuracy — whether the model respects explicit constraints (word count, format, inclusion/exclusion of specific content).
- How it was administered
- 541 prompts with verifiable constraints; automated checking of constraint satisfaction; prompt-level and instruction-level accuracy.
Model rankings
Models ranked by score on this benchmark. Higher is better.
| Rank | Model | Provider | Score | Percentile | Tags |
|---|