Chatbot Arena ELO
Benchmark website →Chatbot Arena uses crowdsourced human preference votes to rank LLMs via an ELO rating system. Models are compared pairwise by anonymous judges.
About this test
- What it measures
- Overall human preference in open-ended conversation quality.
- How it was administered
- Pairwise blind comparisons; crowdsourced votes from LMSYS Chatbot Arena; ELO calculated from win/loss/tie records.
Model rankings
Models ranked by score on this benchmark. Higher is better.
| Rank | Model | Provider | Score | Percentile | Tags |
|---|---|---|---|---|---|
| 1 | OpenAI | 1292.0 | — | Code Assistant, Small, Text Generation, Multimodal, Proprietary | |
| 2 | OpenAI | 1270.0 | — | Text Generation, Small, Multimodal, Reasoning, Proprietary |