MATH Benchmark Rankings | BAUS.AI — AI Agents & Models Ranking

MATH

Name: MATH Benchmark Results
Creator: BAUS.AI

MATH contains 12,500 competition-style math problems (algebra, geometry, precalculus, etc.) from AMC and similar contests.

What it measures: Hard mathematical reasoning and step-by-step solution ability.
How it was administered: Free-form answers; solutions are parsed and compared to ground truth; typically with chain-of-thought.

Model rankings

Models ranked by score on this benchmark. Higher is better.

Rank	Model	Provider	Score	Percentile	Tags
1	GPT-o1	OpenAI	94.8	p99	Text Generation, Reasoning, Proprietary
2	DeepSeek R1	DeepSeek	93.5	p99	Text Generation, Reasoning, Open Weight, Large