Claude Fable 5 vs Claude Sonnet 5: Full Benchmark Comparison (July 2026)

July 1, 2026. Two Claude models, two tiers, one question: Mythos or Sonnet? Claude Fable 5 — Anthropic's first publicly available Mythos-class model — sits at the top with 80.3% SWE-bench Pro and an $50/1M output price. Claude Sonnet 5 — released June 30, just one day ago — delivers 63.2% Pro at $15/1M (intro $10 through August 31). That's 79% of Fable 5's coding capability at 30% of the price. But benchmarks are only part of the story. Here's the complete data-driven comparison — 9 benchmarks, pricing deep-dive, value analysis, and a clear verdict on which model to use when.

🔮 Key Findings

Fable 5 leads all 8 directly comparable benchmarks. SWE-bench Pro: +17.1. Terminal-Bench: +7.6. HLE no tools: +13.6. Average gap: +8.2 points across all shared evaluations.
Sonnet 5 costs 70% less. $15/1M output vs $50/1M. Introductory pricing at $10/1M through August 31 makes it 5× cheaper than Fable 5.
Sonnet 5 is 2.2× better value per Pro point. $0.24/point (standard) vs Fable 5's $0.62/point. At intro batch pricing: $0.08/point vs Fable's $0.31/point — nearly 4× better.
Sonnet 5 beats Fable 5 on FrontierCode v1 (38.8% vs 29.3% Diamond). Different test versions, but Sonnet's production code quality is remarkably close to Mythos territory.
Fable 5 just returned from a 19-day export suspension. Access restored July 1, 2026. Sonnet 5 launched during the ban and is already production-proven.

Also see: Fable 5 Complete Review · Sonnet 5 vs Sonnet 4.6 · SWE-bench Pro Leaderboard · Terminal-Bench Leaderboard · Pricing Calculator.

The Context: Two Models, One Month, One Company

Anthropic released Fable 5 on June 9, 2026 — their first publicly available Mythos-class model. It was immediately the most capable AI model ever made generally available. Three days later, on June 12, a US export control directive forced Anthropic to suspend access for all users. For 19 days, Fable 5 was offline.

During that window, Anthropic launched Claude Sonnet 5 (June 30) — a model that, according to their system card, "performs close to Opus 4.8 at lower prices." It was not blocked by export controls because its cybersecurity capabilities are deliberately limited. The timing is significant: Sonnet 5 became the best available Claude model for most of the developer ecosystem while Fable was unavailable.

Now, on July 1, both models are live. Fable 5 is back. Sonnet 5 is production-proven. Which one should you use? Let's look at every number.

Head-to-Head: Every Comparable Benchmark

Below is the most comprehensive side-by-side comparison available as of July 1, 2026. All scores are Anthropic-reported from the respective system cards, using max effort / adaptive thinking at default sampling settings. Purple cells indicate the leader.

Claude Fable 5 vs Sonnet 5 benchmark comparison bar chart

Benchmark	Claude Fable 5	Claude Sonnet 5	Gap	What It Measures
SWE-bench Pro	80.3%	63.2%	+17.1	Real GitHub issues, multi-file, contamination-resistant
Terminal-Bench 2.1	88.0%	80.4%	+7.6	CLI coding, shell, build systems
HLE (no tools)	56.8%	43.2%	+13.6	Multidisciplinary expert reasoning
HLE (with tools)	64.5%	57.4%	+7.1	Expert reasoning + tool orchestration
OSWorld-Verified	85.0%	81.2%	+3.8	Computer use, GUI navigation
AutomationBench	17.4%	13.5%	+3.9	Tool use and workflow automation
HealthBench Professional	66.0%	57.8%	+8.2	Clinical accuracy across 26 specialties
Legal Agent Benchmark	13.3%	8.9%	+4.4	Legal document analysis and reasoning
FrontierCode	29.3% (Diamond)	38.8% (v1)	—	Production-quality code (different test versions — not directly comparable)
GDPval-AA	1932 (v1)	1618 (v2)	—	Knowledge work (different test versions — not directly comparable)

Sources: Fable 5 System Card · Sonnet 5 System Card. All scores Anthropic-reported with max effort / adaptive thinking at default sampling. FrontierCode and GDPval-AA use different benchmark versions (Diamond vs v1, v1 vs v2) and are not directly comparable.

Pricing: 3.3× Cheaper, 2.2× Better Value

This is where the comparison gets interesting. Fable 5 delivers more capability — but Sonnet 5 delivers dramatically better value.

	Claude Fable 5	Claude Sonnet 5	Winner
Input $/1M	$10.00	$3.00	Sonnet 5
Output $/1M	$50.00	$15.00	Sonnet 5
Intro Output $/1M	$50.00	$10.00	Sonnet 5 (thru Aug 31)
Batch Output $/1M	$25.00	$7.50	Sonnet 5
Cached Input $/1M	$1.00	$0.30	Sonnet 5
Context Window	1M tokens	1M tokens	Tie
Batch/Flex Discount	✅ 50%	✅ 50%	Tie
Prompt Caching	✅ 90%	✅ 90%	Tie
Subscription Access	Credits required	Default on Free/Pro	Sonnet 5

Cost per SWE-bench Pro Point

The clearest value metric: how much does each point of coding capability cost you?

Configuration	Fable 5	Sonnet 5	Sonnet Advantage
Standard list price	$0.62/pt	$0.24/pt	2.6× better
Batch pricing	$0.31/pt	$0.12/pt	2.6× better
Intro + Batch (thru Aug 31)	$0.31/pt	$0.08/pt	3.9× better

Real-World Coding Session Cost

A typical heavy coding session — 3M input + 1M output tokens:

Model	Standard	Batch	100 Sessions/mo (Batch)
Claude Fable 5	$80.00	$40.00	$4,000
Claude Sonnet 5	$24.00	$12.00	$1,200
Claude Sonnet 5 (intro)	$19.00	$9.50	$950

At batch pricing, Sonnet 5 costs $1,200/month for 100 heavy sessions — 70% less than Fable 5 at $4,000. For teams running high-volume coding agents, the savings compound fast.

Claude model value efficiency cost per Pro point

What Each Gap Actually Means

SWE-bench Pro: The 17.1-Point Chasm

This is the single most important number. SWE-bench Pro tests real GitHub issue resolution — reading unfamiliar codebases, making multi-file changes, and passing test suites with no answer leakage. Fable 5 at 80.3% vs Sonnet 5 at 63.2% is a 17.1-point gap — larger than Sonnet 5's lead over GPT-5.5 (+4.6 points) and Sonnet 4.6 (+5.1 points).

What does this mean in practice? On complex multi-file bugs where Sonnet 5 succeeds roughly 2 out of 3 times, Fable 5 succeeds 4 out of 5 times. That's the difference between "usually works" and "you can trust it." For solo developers, that extra reliability may be worth the premium. For teams running verification passes, Sonnet 5 at 2.6× better value is hard to beat.

Terminal-Bench 2.1: The Narrowest Gap

Fable 5: 88.0%. Sonnet 5: 80.4%. A 7.6-point gap — the closest of any major benchmark. Both models are excellent at CLI operations: installing packages, debugging builds, managing git, configuring servers. Sonnet 5's 80.4% is just 2.3 points behind Opus 4.8 (82.7%) and ahead of GPT-5.5's 78.2% (per tbench.ai). For DevOps automation and terminal-based agent work, Sonnet 5 is essentially Opus-class.

Humanity's Last Exam: The Reasoning Divide

Without tools: Fable 5 56.8% vs Sonnet 5 43.2% — a 13.6-point gap. With tools: Fable 5 64.5% vs Sonnet 5 57.4% — a 7.1-point gap. Fable 5's advantage shrinks when both models have access to tools, suggesting the gap is partly about raw knowledge breadth (where Fable's larger model size matters more) and partly about reasoning depth (where tools help level the field).

OSWorld: The Near-Tie

Fable 5: 85.0%. Sonnet 5: 81.2%. A 3.8-point gap — the tightest of any benchmark. Both models are exceptionally capable at GUI navigation, desktop automation, and computer use tasks. Sonnet 5 is within 2.2 points of Opus 4.8 (83.4%) on this benchmark. For browser automation and computer-use agents, Sonnet 5 is essentially indistinguishable from Anthropic's top-tier models.

FrontierCode & GDPval-AA: The Version Caveat

These benchmarks use different versions across the two models and cannot be directly compared. Fable 5's 29.3% on FrontierCode Diamond tests production-quality code on novel problems — the hardest coding benchmark available. Sonnet 5's 38.8% is on FrontierCode v1, a different (and likely easier) test set. Similarly, Fable 5's 1932 on GDPval-AA v1 and Sonnet 5's 1618 on v2 use different question sets and scoring methodologies. We include them for completeness but they don't inform the head-to-head.

Claude Fable 5 vs Sonnet 5 capability radar chart

Safety, Access, and Production Readiness

	Claude Fable 5	Claude Sonnet 5
Safety Architecture	Cyber/bio/chemistry classifiers → falls back to Opus 4.8 on ~5% of queries	Deliberately limited cyber capabilities — no fallback needed
Export Status	Previously suspended (Jun 12–30); restored July 1	Never suspended — available continuously since launch
Production Stability	3 days of GA before suspension; just restored	1 day old but no regulatory headwinds
Data Retention	30-day mandatory retention (all business customers)	Standard Anthropic retention policy
Subscription	Credits required (was free on plans thru June 22)	Default model on Free and Pro plans
API Access	Available to all	Available to all
Rate Limits	Standard	Raised limits to accommodate heavier token usage

The production stability question is real. Fable 5 has already been suspended once. While the export control is now lifted, the regulatory precedent means another suspension is not impossible. Sonnet 5 was deliberately designed to avoid triggering the same controls — Anthropic limited its cybersecurity capabilities specifically so it could ship without restrictions. For teams building production pipelines, Sonnet 5's regulatory safety may matter more than Fable 5's capability edge.

Claude Fable 5 vs Sonnet 5 price vs performance scatter

The Tokenizer Tax: What You Actually Pay

Sonnet 5 uses an updated tokenizer that produces 1.0–1.35× more tokens than Sonnet 4.6 for the same input. Anthropic's introductory pricing ($2/$10 instead of $3/$15) is designed to make the transition roughly cost-neutral from Sonnet 4.6. But compared to Fable 5, the tokenizer difference means Sonnet 5 consumes slightly more tokens for the same prompt — partially offsetting its lower per-token price.

Key tokenizer ratios from the system card:

Content Type	Sonnet 4.6 Tokens	Sonnet 5 Tokens	Ratio
English text	2,356	3,341	1.42×
Spanish text	3,572	4,747	1.33×
Python code (4,279 lines)	44,014	56,113	1.27×
Chinese (Simplified)	3,334	3,360	1.01×

Source: Sonnet 5 System Card, tokenizer comparison table. Fable 5 uses the same tokenizer as Opus 4.8 (legacy tokenizer, comparable to Sonnet 4.6 ratios).

The practical impact: for a Python-heavy coding session, Sonnet 5's effective cost advantage over Fable 5 shrinks from 3.3× to roughly 2.6× after accounting for the tokenizer — still massive, but worth factoring in for high-volume pipelines.

Where They Fit in the Claude Lineup

Anthropic now has four active tiers, and the positioning is clearer than ever:

Tier	Model	Pro Score	Output $/1M	Best For
Mythos	Claude Fable 5	80.3%	$50.00	Hardest coding problems, long-horizon autonomy, research
Opus	Claude Opus 4.8	69.2%	$25.00	Complex engineering, calibrated honesty, unattended agents
Sonnet	Claude Sonnet 5	63.2%	$15.00	Daily coding, CLI agents, computer use, best value
Haiku	Claude Haiku 4.5	—	$5.00	Fast chat, simple tasks, high-volume pipelines

Sonnet 5 at 63.2% Pro is just 6 points behind Opus 4.8 (69.2%) at 60% of the price. It beats Opus on GDPval-AA v2 knowledge work (1618 vs 1615). It's within 2.2 points on OSWorld and 2.3 points on Terminal-Bench. For most developers, Sonnet 5 makes Opus 4.8 unnecessary. Fable 5 remains the aspirational ceiling — the model you reach for when the problem genuinely requires Mythos-class reasoning.

Verdict: Capability vs Value — Pick Your Priority

Use Case	Winner	Why
Hardest bugs (4+ files, novel codebase)	Fable 5	80.3% Pro — 17.1 points ahead. The gap widens on harder tasks.
CLI / DevOps automation	Fable 5	88.0% TB 2.1. But Sonnet at 80.4% is close enough for most teams.
Long-horizon autonomous tasks	Fable 5	Stripe 50M-line migration. Week-long genomics. Pokémon full playthrough.
Daily coding (routine bugs, PRs)	Sonnet 5	63.2% Pro is plenty. 3.3× cheaper. Default on Free/Pro plans.
Computer use / browser agents	Tie	85.0% vs 81.2% — barely distinguishable. Use Sonnet for cost.
Cost-sensitive high volume	Sonnet 5	$0.12/Pro point (batch) vs $0.31. 70% cheaper per session.
Production pipelines (no human review)	Sonnet 5	No export suspension risk. No safety fallbacks. Proven availability.
Scientific research / expert reasoning	Fable 5	56.8% HLE (no tools) vs 43.2%. 13.6-point gap in raw reasoning.
Legal / healthcare compliance	Fable 5	13.3% vs 8.9% (Legal). 66.0% vs 57.8% (HealthBench).
Best overall value	Sonnet 5	79% of Fable 5's coding capability at 30% of the price.

Conclusion: The Right Tool for the Right Job

Claude Fable 5 is the most capable AI model ever made publicly available. It leads every shared benchmark, often by double-digit margins. It completed Stripe's 50M-line migration in a day. It designs proteins better than humans. It plays Pokémon with vision alone. If you need maximum capability and can verify the output, there is no substitute.

Claude Sonnet 5 is the smarter choice for most developers, most of the time. At 63.2% Pro — just 6 points behind Opus 4.8 — it handles daily coding, CLI automation, and computer use at near-Opus quality for Sonnet prices. The introductory $10/1M output pricing (through August 31) makes it the best value in the Anthropic lineup by a wide margin. And unlike Fable 5, it was designed to ship without regulatory headwinds.

The honest recommendation: use both. Sonnet 5 for your daily driver — coding, PRs, terminal work, computer use. Fable 5 for the hard problems — complex multi-file bugs, long-horizon autonomous tasks, research, and problems where a 17-point Pro gap is the difference between success and failure. At 2.6× better value per Pro point, Sonnet 5 handles 80% of what Fable 5 can do for 30% of the cost. The remaining 20% — the genuinely hard problems — is where Mythos earns its premium.

⚡ Compare Fable 5 & Sonnet 5 on CodingFleet →

20+ LLMs. Side-by-side testing. Find which model handles your hardest problems.

Sources: Anthropic — Fable 5 & Mythos 5 System Card | Anthropic — Sonnet 5 System Card & Launch Announcement | Simon Willison — What's New in Sonnet 5 | DataCamp — Sonnet 5 Benchmarks & Pricing | Cosmic JS — Developer Guide | Morphllm — Claude Benchmarks. All benchmark scores Anthropic-reported unless otherwise noted. Pricing as of July 1, 2026. Sonnet 5 introductory pricing valid through August 31, 2026.