🆕 Updated June 9, 2026: Claude Fable 5 launched — 93.9% SWE-bench Verified, 80.3% SWE-bench Pro, 88.0% Terminal-Bench 2.1. The first Mythos-class model available to everyone. First model to break 90% Verified. First to break 80% Pro. First to break 85% Terminal-Bench — taking the crown from GPT-5.5, which had dominated CLI coding at 83.4%. $10/$50 per 1M tokens. This is the story of the most intense period of progress in AI history. Test all models on CodingFleet.

🆕 Claude Fable 5 — The Mythos Milestone

June 9, 2026: Anthropic's first publicly available Mythos-class model. Same underlying model as Claude Mythos 5 (restricted). 93.9% SWE-bench Verified (+5.3 over Opus 4.8), 80.3% SWE-bench Pro (+11.1), 88.0% Terminal-Bench 2.1 (+4.6 over GPT-5.5 — dethrones the former CLI king). First model above 90% Verified. First above 80% Pro. Price: $10/$50 per 1M. Safety classifiers on cyber/bio/chemistry queries fall back to Opus 4.8 (~5% of sessions). See full leaderboard →

In March 2024, Claude 3 Opus scored 33.4% on SWE-bench Verified — the best available. It cost $75 per million output tokens. Twenty-seven months later, Claude Fable 5 scores 93.9% — a 60.5-point leap. But the price story split: OpenAI's GPT-5.5 doubled to $30, Anthropic launched Fable 5 at $50, while DeepSeek V4 Pro collapsed to $0.87. The gap between cheapest frontier and most expensive is now 57×.

📊 Key Milestones

  • Mar 2024: Claude 3 Opus scores 33.4% on SWE-bench Verified at $75/1M output. The starting line.
  • Oct 2025: Claude Opus 4.5 breaks 80% for the first time (80.9%). Anthropic cuts Opus price from $75 to $25.
  • Feb 2026: OpenAI stops reporting Verified scores — contamination confirmed. SWE-bench Pro becomes the trusted benchmark.
  • Apr 2026: Claude Opus 4.7 at 87.6%. GPT-5.5 launches and dominates Terminal-Bench at 83.4% — a record that stands until Fable 5.
  • May 2026: DeepSeek makes 75% discount permanent: $0.87/1M. Claude Opus 4.8 at 88.6% Verified, 69.2% Pro.
  • 🆕 Jun 9, 2026: Claude Fable 5 — first model above 90% Verified (93.9%), first above 80% Pro (80.3%), first above 85% Terminal-Bench (88.0%), dethroning GPT-5.5 as CLI king. 57× spread vs DeepSeek V4 Pro.

The SWE-bench Verified Ascent: From 33.4% to 93.9%

SWE-bench Verified — 500 real GitHub issues from 12 Python repositories — has been the industry's north star since August 2024. Claude has held the record at all 8 competitive checkpoints:

SWE-bench Verified ascent: 33.4% to 93.9%
DateRecord-Setting ModelSWE-bench VerifiedSignificance
Mar 2024Claude 3 Opus33.4%Starting line.
Jun 2024Claude 3.5 Sonnet49.0%First near 50%.
Feb 2025Claude 3.7 Sonnet62.3%First to break 60%.
Oct 2025Claude Opus 4.580.9%First to break 80%. Anthropic cuts Opus 67%.
Feb 2026Claude Opus 4.680.8%Verified saturating. OpenAI withdraws.
Apr 2026Claude Opus 4.787.6%Biggest jump before Fable 5.
May 2026Claude Opus 4.888.6%Former record.
🆕 Jun 2026Claude Fable 593.9%First above 90%. Mythos-class.

The Three-Lane Race: Claude vs GPT-5.5

By late 2025, a single benchmark was no longer sufficient. SWE-bench Pro launched as the contamination-resistant successor. Terminal-Bench emerged as the standard for CLI coding. GPT-5.5 dominated Terminal-Bench from April 2026 at 83.4% — until Fable 5 arrived at 88.0% and took the crown. Here's the full timeline:

Three-lane race: Claude vs GPT-5.5

Claude owns Verified (yellow) and Pro (green). GPT-5.5 (♦ markers) entered in April 2026 and immediately led Terminal-Bench at 83.4% — holding that position for 47 days until Fable 5's 88.0% in June. Fable 5 now leads all three lanes.

Fable 5 vs Previous Champions

BenchmarkClaude Fable 5Previous BestGap
SWE-bench Pro80.3%Opus 4.8 (69.2%)+11.1
Terminal-Bench 2.188.0%GPT-5.5 (83.4%)+4.6
SWE-bench Verified93.9%Opus 4.8 (88.6%)+5.3
GPQA Diamond94.5%Gemini 3.1 Pro (94.3%)+0.2
HLE (no tools)56.8%Gemini 3.1 Pro (44.4%)+12.4
FrontierCode Diamond29.3%Opus 4.8 (13.4%)+15.9

Pricing: Premium $50, Stable $30, Freefall $0.87

Three diverging strategies: OpenAI holding at $30, Anthropic launching a premium $50 Mythos tier, open-weight collapsing below $1:

Pricing divergence
DateHighest-Priced$/1MLowest (Frontier)$/1MSpread
Mar 2024Claude 3 Opus$75.00
Jan 2025Claude Opus 4.1$75.00DeepSeek V3$1.1068×
Oct 2025Claude Opus 4.5$25.00DeepSeek V3.2$0.4260×
May 2026GPT-5.5$30.00DeepSeek V4 Pro$0.8734×
🆕 Jun 2026Claude Fable 5$50.00DeepSeek V4 Pro$0.8757×

The 27 Months That Changed Everything

In March 2024, AI coding was an experiment: 33.4% Verified, $75/1M, no open-weight alternatives. In June 2026, AI coding is infrastructure. The best model scores 93.9% Verified, 80.3% Pro, 88.0% Terminal-Bench. Open-weight delivers frontier-adjacent coding at $0.87/1M. Premium Mythos-class capability costs $50/1M. GPT-5.5's 47-day Terminal-Bench reign is over. Fable 5 leads every lane.


Sources: Anthropic Fable 5 Announcement (Jun 9, 2026) | Vals.ai | SWE-bench Official | DeepSeek V4 Model Card. Fable 5/Mythos 5 share the same model. Verified score from Mythos Preview system card — Anthropic has not published a separate Fable 5 Verified score. ⚠️ Verified contaminated per OpenAI (Feb 2026).