SWE-bench Pro Leaderboard
The hardest coding benchmark for AI. Real GitHub issues, multi-file diffs, production repositories — not memorized answers. 31 models ranked.
Last updated: June 17, 2026 · 🆕 Claude Fable 5 added at #1 (80.3%) · Compare pricing → · Terminal-Bench →
🆕 Claude Fable 5 (June 9, 2026): Anthropic's first publicly available Mythos-class model. 80.3% Pro — +11.1 points over Opus 4.8. Same underlying model as Claude Mythos 5 (restricted). Safety classifiers on cyber/bio/chemistry queries fall back to Opus 4.8. $10/$50 per 1M tokens.
Scores: Vendor-reported unless otherwise noted. "—" means not published. DeepSeek: permanent 75% discount. Qwen: 50% promo.
Test these models on real code
20+ LLMs on CodingFleet. Side-by-side testing on your own repos. Benchmarks are a compass, not a map.
🚀 Try on CodingFleet →