SWE-bench Pro Leaderboard

The hardest coding benchmark for AI. Real GitHub issues, multi-file diffs, production repositories — not memorized answers. 30 models ranked.

Last updated: June 8, 2026 · Sources linked per model · Compare pricing →

#
Model
Prov
Pro
Verif
Size
License
$/1M Out
Released
Src
About SWE-bench Pro: Tests whether an AI model can resolve real GitHub issues end-to-end. Unlike SWE-bench Verified (contaminated — see OpenAI's Feb 2026 withdrawal), Pro uses actively maintained repositories with no public ground-truth leakage.
Scores: Vendor-reported unless otherwise noted. "—" means not published. Qwen scores from Qwen official blogs. DeepSeek: permanent 75% discount. Qwen: 50% promo.
Release dates: Sourced from official announcements and vendor blogs.

Test these models on real code

20+ LLMs on CodingFleet. Side-by-side testing on your own repos. Benchmarks are a compass, not a map.

🚀 Try on CodingFleet →