MATH
Benchmark website →MATH contains 12,500 competition-style math problems (algebra, geometry, precalculus, etc.) from AMC and similar contests.
About this test
- What it measures
- Hard mathematical reasoning and step-by-step solution ability.
- How it was administered
- Free-form answers; solutions are parsed and compared to ground truth; typically with chain-of-thought.
Model rankings
Models ranked by score on this benchmark. Higher is better.
| Rank | Model | Provider | Score | Percentile | Tags |
|---|---|---|---|---|---|
| 1 | OpenAI | 94.8 | p99 | Text Generation, Reasoning, Proprietary | |
| 2 | DeepSeek | 93.5 | p99 | Text Generation, Reasoning, Open Weight, Large | |