WinoGrande
Benchmark website →WinoGrande is a large-scale dataset of 44K Winograd-style commonsense reasoning problems with adversarial filtering.
About this test
- What it measures
- Commonsense reasoning — resolving pronoun references requiring world knowledge.
- How it was administered
- Binary choice (fill-in-the-blank); accuracy metric; 5-shot evaluation.
Model rankings
Models ranked by score on this benchmark. Higher is better.
| Rank | Model | Provider | Score | Percentile | Tags |
|---|---|---|---|---|---|
| 1 | Anthropic | 83.9 | — | Multimodal, Small, Text Generation, Proprietary | |
| 2 | Anthropic | 83.7 | — | Code Assistant, Small, Text Generation, Multimodal, Reasoning, Proprietary |