Trending & What's New

The latest in AI models, benchmarks, and rankings.

Top Performers

View all →

GPT-o1

OpenAI

93.5 perf★ 4.6520 reviews

ElevenLabs

93.0 perf★ 4.72400 reviews

GPT-4o

OpenAI

92.5 perf★ 4.71240 reviews

Voyage 3

Voyage AI

92.5 perf★ 4.6310 reviews

DeepSeek R1

DeepSeek

92.0 perf★ 4.6480 reviews

Model	Provider	Performance	Updated
Grok 2	xAI	89.0	Today
Mistral Large	Mistral AI	87.1	Today
Llama 3.1 405B	Meta	88.4	Today
GPT-4o mini	OpenAI	85.0	Today
GPT-4o	OpenAI	92.5	Today

Benchmarks

VBench

VBench is a comprehensive benchmark for video generation models evaluating quality, consistency, and prompt alignment.

Added 3 months ago

MOS

Mean Opinion Score rates speech synthesis quality on a 1-5 scale, normalized to 0-100 for this platform.

Added 3 months ago

WER (inverted)

Word Error Rate measures speech recognition accuracy. Shown here as accuracy (100 - WER) so higher is better.

Added 3 months ago

MTEB

Massive Text Embedding Benchmark evaluates embeddings across 8 tasks: classification, clustering, pair classification, reranking, retrieval, STS, summarization.

Added 3 months ago

DPG-Bench

Dense Prompt Graph Benchmark evaluates image generation models on complex, detailed text prompts with multiple requirements.

Added 3 months ago

GenEval

GenEval evaluates compositional text-to-image generation across attributes like color, shape, position, and counting.

Added 3 months ago

LiveCodeBench

LiveCodeBench evaluates code generation on competitive programming problems released after model training cutoffs.

Added 3 months ago

SWE-bench Verified

SWE-bench Verified is a human-validated subset of real GitHub issues from popular Python repositories, testing end-to-end software engineering.

Added 3 months ago

Latest from the Blog

View all →

Trending & What's New

Top Performers

Recently Updated

Benchmarks

Latest from the Blog

Trending & What's New

Top Performers

Recently Updated

Benchmarks

Latest from the Blog