The latest in AI models, benchmarks, and rankings.
| Model | Provider | Performance | Updated |
|---|---|---|---|
| Grok 2 | xAI | 89.0 | Today |
| Mistral Large | Mistral AI | 87.1 | Today |
| Llama 3.1 405B | Meta | 88.4 | Today |
| GPT-4o mini | OpenAI | 85.0 | Today |
| GPT-4o | OpenAI | 92.5 | Today |
VBench
VBench is a comprehensive benchmark for video generation models evaluating quality, consistency, and prompt alignment.
Added 3 weeks ago
MOS
Mean Opinion Score rates speech synthesis quality on a 1-5 scale, normalized to 0-100 for this platform.
Added 3 weeks ago
WER (inverted)
Word Error Rate measures speech recognition accuracy. Shown here as accuracy (100 - WER) so higher is better.
Added 3 weeks ago
MTEB
Massive Text Embedding Benchmark evaluates embeddings across 8 tasks: classification, clustering, pair classification, reranking, retrieval, STS, summarization.
Added 3 weeks ago
DPG-Bench
Dense Prompt Graph Benchmark evaluates image generation models on complex, detailed text prompts with multiple requirements.
Added 3 weeks ago
GenEval
GenEval evaluates compositional text-to-image generation across attributes like color, shape, position, and counting.
Added 3 weeks ago
LiveCodeBench
LiveCodeBench evaluates code generation on competitive programming problems released after model training cutoffs.
Added 3 weeks ago
SWE-bench Verified
SWE-bench Verified is a human-validated subset of real GitHub issues from popular Python repositories, testing end-to-end software engineering.
Added 3 weeks ago
AI News Roundup: DeepSeek V4 Goes Open Source, Google Drops Gemma 4, OpenAI Closes $122B Round
DeepSeek releases a trillion-parameter open-source model, Google launches Gemma 4 under Apache 2.0, OpenAI closes the largest funding round in history at $122B, and California sets a national testing ground for AI regulation.
Baus AI · Today
AI News Roundup: Google Launches Gemma 4, OpenAI Shares Go Cold, Alibaba Drops Qwen3.6-Plus
Google releases Gemma 4 open models for agentic AI, investors dump $600M in OpenAI shares with no buyers as Anthropic heats up, and Alibaba unveils Qwen3.6-Plus with 1M-token context.
Baus AI · Yesterday
AI News Roundup: Anthropic’s Code Leak Fallout, OpenAI Investors Flee to Anthropic, California Defies Federal AI Order
Anthropic scrambles after its Claude Code source leaks via npm, OpenAI shares go unsold as $2B floods into Anthropic, and California fires back at Trump’s AI preemption order.
Baus AI · 2 days ago
AI News Roundup: Claude Code Source Leak, Oracle Cuts 30K Jobs, GPT-5.4 Goes Agentic
Anthropic accidentally leaks Claude Code’s full source via npm, Oracle slashes 30,000 jobs to fund $156B AI data center bet, and Q1 2026 venture funding hits a record $297 billion.
Baus AI · 3 days ago
AI News Roundup: Anthropic’s Mythos Leak, Reddit’s Bot Crackdown, White House AI Framework
Anthropic’s leaked Mythos model raises cybersecurity alarms, Reddit rolls out bot verification today, the White House drops a national AI policy framework, and Anthropic eyes a $60B IPO.
Baus AI · 4 days ago