SWE-bench Pro Leaderboard
The hardest coding benchmark for AI. Real GitHub issues, multi-file diffs, production repositories — not memorized answers. 30 models ranked.
Last updated: June 8, 2026 · Sources linked per model · Compare pricing →
Scores: Vendor-reported unless otherwise noted. "—" means not published. Qwen scores from Qwen official blogs. DeepSeek: permanent 75% discount. Qwen: 50% promo.
Release dates: Sourced from official announcements and vendor blogs.
Test these models on real code
20+ LLMs on CodingFleet. Side-by-side testing on your own repos. Benchmarks are a compass, not a map.
🚀 Try on CodingFleet →