Chatbot Arena ELO Benchmark Rankings | BAUS.AI — AI Agents & Models Ranking

Chatbot Arena ELO

Name: Chatbot Arena ELO Benchmark Results
Creator: BAUS.AI

Chatbot Arena uses crowdsourced human preference votes to rank LLMs via an ELO rating system. Models are compared pairwise by anonymous judges.

What it measures: Overall human preference in open-ended conversation quality.
How it was administered: Pairwise blind comparisons; crowdsourced votes from LMSYS Chatbot Arena; ELO calculated from win/loss/tie records.

Model rankings

Models ranked by score on this benchmark. Higher is better.

Rank	Model	Provider	Score	Percentile	Tags
1	Mistral Large	Mistral AI	1275.0	—	Text Generation, Small, Reasoning, Proprietary
2	Llama 3.1 405B	Meta	1274.0	—	Reasoning, Large, Text Generation, Open Weight