WinoGrande
Benchmark website →WinoGrande is a large-scale dataset of 44K Winograd-style commonsense reasoning problems with adversarial filtering.
About this test
- What it measures
- Commonsense reasoning — resolving pronoun references requiring world knowledge.
- How it was administered
- Binary choice (fill-in-the-blank); accuracy metric; 5-shot evaluation.
Model rankings
Models ranked by score on this benchmark. Higher is better.
| Rank | Model | Provider | Score | Percentile | Tags |
|---|---|---|---|---|---|
| 1 | OpenAI | 89.6 | — | Text Generation, Small, Multimodal, Reasoning, Proprietary | |
| 2 | Anthropic | 89.4 | — | Multimodal, Small, Text Generation, Proprietary | |