SWE-bench Pro Leaderboard

The hardest coding benchmark for AI. Real GitHub issues, multi-file diffs, production repositories — not memorized answers. 31 models ranked.

Last updated: June 17, 2026 · 🆕 Claude Fable 5 added at #1 (80.3%) · Compare pricing → · Terminal-Bench →

#
Model
Prov
Pro
Verif
Size
License
$/1M Out
Released
Src
About SWE-bench Pro: Tests whether an AI model can resolve real GitHub issues end-to-end. Unlike SWE-bench Verified (contaminated — see OpenAI's Feb 2026 withdrawal), Pro uses actively maintained repositories with no public ground-truth leakage.
🆕 Claude Fable 5 (June 9, 2026): Anthropic's first publicly available Mythos-class model. 80.3% Pro — +11.1 points over Opus 4.8. Same underlying model as Claude Mythos 5 (restricted). Safety classifiers on cyber/bio/chemistry queries fall back to Opus 4.8. $10/$50 per 1M tokens.
Scores: Vendor-reported unless otherwise noted. "—" means not published. DeepSeek: permanent 75% discount. Qwen: 50% promo.

Test these models on real code

20+ LLMs on CodingFleet. Side-by-side testing on your own repos. Benchmarks are a compass, not a map.

🚀 Try on CodingFleet →