GSM8K
Benchmark website →Grade School Math 8K is a dataset of 8.5K grade-school math word problems requiring multi-step arithmetic reasoning.
About this test
- What it measures
- Mathematical reasoning and multi-step problem solving.
- How it was administered
- Free-form numerical answers; chain-of-thought or direct; exact match for final numeric answer.
Model rankings
Models ranked by score on this benchmark. Higher is better.
| Rank | Model | Provider | Score | Percentile | Tags |
|---|---|---|---|---|---|
| 1 | OpenAI | 97.8 | p99 | Text Generation, Reasoning, Proprietary | |
| 2 | DeepSeek | 97.3 | p99 | Text Generation, Reasoning, Open Weight, Large | |