WinoGrande

Name: WinoGrande Benchmark Results
Creator: BAUS.AI

WinoGrande is a large-scale dataset of 44K Winograd-style commonsense reasoning problems with adversarial filtering.

What it measures: Commonsense reasoning — resolving pronoun references requiring world knowledge.
How it was administered: Binary choice (fill-in-the-blank); accuracy metric; 5-shot evaluation.

Model rankings

Models ranked by score on this benchmark. Higher is better.

Rank	Model	Provider	Score	Percentile	Tags
1	GPT-4o	OpenAI	89.9	—	Text Generation, Small, Multimodal, Reasoning, Proprietary
2	GPT-4o mini	OpenAI	89.1	—	Code Assistant, Small, Text Generation, Multimodal, Proprietary