ARC-Challenge Benchmark Rankings | BAUS.AI — AI Agents & Models Ranking

ARC-Challenge

Name: ARC-Challenge Benchmark Results
Creator: BAUS.AI

AI2 Reasoning Challenge (Challenge set) contains 2,590 grade-school science questions that retrieval-based algorithms fail on.

What it measures: Science reasoning and common knowledge beyond simple retrieval.
How it was administered: Multiple-choice; 4-5 options; 0-shot or 25-shot; accuracy metric.

Model rankings

Models ranked by score on this benchmark. Higher is better.

Rank	Model	Provider	Score	Percentile	Tags
1	GPT-4o	OpenAI	96.3	—	Text Generation, Small, Multimodal, Reasoning, Proprietary
2	Claude 3 Haiku	Anthropic	96.0	—	Multimodal, Small, Text Generation, Proprietary