SWE-bench Verified Benchmark Rankings | BAUS.AI — AI Agents & Models Ranking

SWE-bench Verified

Name: SWE-bench Verified Benchmark Results
Creator: BAUS.AI

SWE-bench Verified is a human-validated subset of real GitHub issues from popular Python repositories, testing end-to-end software engineering.

What it measures: Real-world software engineering ability: understanding issues, navigating codebases, writing patches.
How it was administered: Models receive a GitHub issue and repository; must produce a git patch that resolves the issue and passes tests.

Model rankings

Models ranked by score on this benchmark. Higher is better.

Rank	Model	Provider	Score	Percentile	Tags
1	Claude Computer Use	Anthropic	49.0	p92	Autonomous, Multimodal, Proprietary
2	Devin	Cognition	41.5	p88	AI Agent, Autonomous, Code Assistant, Proprietary
3