June 30, 2026. Anthropic drops Claude Sonnet 5 — and the numbers are kind of insane. It's a Sonnet-tier model priced at $3/$15 per million tokens that lands within a few points of Opus 4.8 ($5/$25) on every single benchmark. On knowledge work, it actually beats Opus. On HLE with tools, it's a dead heat. And it absolutely crushes its predecessor, Sonnet 4.6, by double-digit margins on half the benchmarks. Here's the complete comparison, sourced from Anthropic's Sonnet 5 System Card and the official launch announcement. Test both on CodingFleet.

TL;DR — Sonnet 5 vs Opus 4.8

  • Sonnet 5 beats Opus 4.8 on knowledge work: 1618 vs 1615 on GDPval-AA v2. First time a Sonnet out-scores Opus on any benchmark.
  • HLE with tools is a dead heat: 57.4% vs 57.9%. Within margin of error.
  • Opus leads SWE-bench Pro by 6 pts: 69.2% vs 63.2%. The biggest gap — but Sonnet is only 9% behind the flagship.
  • Sonnet costs 1.7× less: $3/$15 vs $5/$25 per MTok. Introductory $2/$10 through Aug 31, 2026.
  • Sonnet 5 crushes Sonnet 4.6: +5.1 Pro, +13.4 Terminal-Bench, +8.6 HLE, +10.6 HLE tools, +223 GDPval. The biggest Sonnet generation-over-generation leap ever.
  • Same 1M context, same 128K output: Identical specs to Opus. Faster latency. Adaptive thinking at high effort default.

Full Benchmark Comparison

BenchmarkSonnet 5Sonnet 4.6Opus 4.8 (Reference)Sonnet 5 vs Opus
Agentic coding (SWE-bench Pro)63.2%58.1%69.2%−6.0
Agentic coding (Terminal-Bench 2.1)80.4%67.0%82.7%−2.3
Reasoning (HLE, no tools)43.2%34.6%49.8%−6.6
Reasoning (HLE, with tools)57.4%46.8%57.9%−0.5 (tie)
Computer use (OSWorld-Verified)81.2%78.5%83.4%−2.2
Knowledge work (GDPval-AA v2)161813951615+3 (Wins!)

Source: All benchmark scores from Anthropic's Claude Sonnet 5 System Card, Table 8.1.A — Capability evaluation summary. All Sonnet 5 results use adaptive thinking at max effort, default sampling, averaged over 5 trials. Opus 4.8 and Sonnet 4.6 scores from the same evaluation harness (Anthropic's internal framework). Directly comparable.

The Historic First: Sonnet Beats Opus on Knowledge Work

GDPval-AA v2 measures applied knowledge work — the kind of tasks that knowledge workers do daily: analyzing documents, synthesizing information, producing structured outputs. Opus 4.8 scores 1615. Sonnet 5 scores 1618. It's a slim margin (+3), but it's the first time a Sonnet-class model has ever outscored the concurrent Opus flagship on any benchmark. Anthropic's own launch announcement: "its performance is close to that of Opus 4.8, but at lower prices." The System Card data shows it's not just close — on applied knowledge, it's ahead.

This is the metric that matters most for everyday professional use. If you're using Claude for document analysis, research synthesis, or knowledge-intensive tasks, Sonnet 5 gives you Opus 4.8-level quality at 60% of the cost. There is no tradeoff — you get better quality for less money.

HLE with Tools: The Dead Heat

Humanity's Last Exam with tools is the most realistic measure of how models perform when they can use browsers, terminals, and code execution to augment their reasoning. Opus 4.8 scores 57.9%. Sonnet 5 scores 57.4%. That's a 0.5-point difference — well within the margin of error for 5-trial averaging. On the hardest reasoning benchmark with tool access, these two models are functionally identical.

Without tools, Opus maintains a clearer lead (49.8% vs 43.2%, −6.6 pts). This suggests Opus still has an edge in raw reasoning capability — but when the models can compensate with tool use, that edge largely evaporates. For developers building agentic workflows, the practical takeaway is clear: Sonnet 5 with tools matches Opus 4.8 with tools on the hardest reasoning tasks.

Coding: Opus Leads, But the Gap Is Shrinking

On the two premier agentic coding benchmarks, Opus 4.8 still holds the lead — but the margins are surprisingly small for a $5/$25 flagship vs a $3/$15 mid-tier model.

SWE-bench Pro: −6.0 Points

The widest gap between the two models. Opus 4.8 at 69.2% vs Sonnet 5 at 63.2%. This 6-point spread reflects real differences in long-horizon bug-fixing capability across complex open-source repositories. Opus is still the go-to for the hardest GitHub issues. But Sonnet 5's 63.2% is a +5.1 point jump over Sonnet 4.6 (58.1%) — a meaningful generation-over-generation gain that puts it firmly in "capable" territory for professional SWE work.

For additional context: Cursor's independent CursorBench evaluation (using their production agent harness) scored Sonnet 5 at 61.2% vs Opus 4.8 at 63.8% — only a 2.6-point gap. Different harness, different tasks, but a consistent story: Sonnet 5 is within striking distance of the flagship on real-world coding tasks.

Terminal-Bench 2.1: −2.3 Points

Opus 4.8 at 82.7% vs Sonnet 5 at 80.4%. A 2.3-point gap. On terminal-based agentic coding — the kind of work developers do in actual shells — Sonnet 5 operates at ~97% of Opus capability. This is also a massive +13.4 point leap over Sonnet 4.6 (67.0%), confirming that Anthropic invested heavily in agentic capability for this release. Simon Willison noted: "its performance is close to that of Opus 4.8, but at lower prices."

Computer Use: −2.2 Points

OSWorld-Verified measures desktop automation — controlling browsers, clicking, typing, navigating UIs. Opus 4.8 at 83.4% vs Sonnet 5 at 81.2%. A 2.2-point difference. Both models comfortably beat the human expert baseline of 72.4%. For computer-use agents, Sonnet 5 delivers near-flagship automation reliability at Sonnet prices — one of the most compelling cost-performance ratios in the current model landscape.

Specification Comparison

FeatureClaude Sonnet 5Claude Opus 4.8
ReleasedJune 30, 2026May 28, 2026
API IDclaude-sonnet-5claude-opus-4-8
Context Window1,000,000 tokens1,000,000 tokens
Max Output128K (300K batch)128K (300K batch)
ThinkingAdaptive (effort: high default)Adaptive (effort: high default)
Extended ThinkingNoNo
Knowledge CutoffJan 2026Jan 2026
Comparative LatencyFastModerate
Pricing (API)$3 / $15 per MTok*$5 / $25 per MTok
TokenizerNew (Opus 4.7+ tokenizer)New (Opus 4.7+ tokenizer)

* Introductory pricing of $2/$10 per MTok through August 31, 2026. Sources: Claude Platform Docs — Models Overview, Anthropic — Introducing Claude Sonnet 5, Claude Sonnet 5 System Card.

The Tokenizer Caveat

Sonnet 5 uses the updated tokenizer that Anthropic introduced with Opus 4.7. The same text produces roughly 1.0× to 1.35× more tokens compared to Sonnet 4.6, depending on content type. Simon Willison's analysis quantified this: English text runs ~1.33–1.42× more tokens, Spanish ~1.33×, Python code ~1.27–1.28×, and Simplified Chinese is essentially unchanged at ~1.01×.

This matters for cost comparisons. When migrating from Sonnet 4.6, your effective per-request cost isn't simply $3/$15 vs the old $3/$15. A 1.3× token inflation means $3.90/$19.50 in real terms for English-heavy workloads. But that's still dramatically cheaper than Opus 4.8 at $5/$25 — and on the introductory $2/$10 pricing, even with inflation you're paying effective rates of ~$2.60/$13.00. Anthropic's announcement: "We've increased rate limits across all surfaces to accommodate the higher token usage of higher effort levels."

Sonnet 5 vs Sonnet 4.6: The Generation Leap

While the Opus comparison gets the headlines, the Sonnet 4.6 → Sonnet 5 upgrade is arguably more impressive. Every single benchmark improved — and several by double-digit margins:

BenchmarkSonnet 4.6Sonnet 5Gain
Terminal-Bench 2.167.0%80.4%+13.4
HLE (with tools)46.8%57.4%+10.6
HLE (no tools)34.6%43.2%+8.6
GDPval-AA v213951618+223
SWE-bench Pro58.1%63.2%+5.1
OSWorld-Verified78.5%81.2%+2.7

The +13.4 on Terminal-Bench and +10.6 on HLE with tools are structural improvements — not marginal gains. Anthropic's framing: "Claude Sonnet 5 is built to be the most agentic Sonnet model yet. It can make plans, use tools like browsers and terminals, and run autonomously at a level that, just a few months ago, required larger and more expensive models." The data backs that up. Sonnet 5's Terminal-Bench score (80.4%) is higher than Opus 4.7 scored at launch — meaning a mid-tier model in July 2026 outperforms the flagship from early 2026 on agentic terminal coding.

Safety: Lower Hallucination, Better Refusal Boundaries

Mashable's coverage highlighted Anthropic's safety improvements: "Anthropic reports Sonnet 5 shows lower rates of hallucination, sycophancy, and other undesirable behaviors than its predecessor, along with improved resistance to prompt-injection attacks." The System Card confirms a lower rate of undesirable behaviors (cooperation with misuse, deception) than Sonnet 4.6.

On cybersecurity, the model was deliberately constrained: "Sonnet 5 never produced a full working exploit" on the Firefox 147 vulnerability benchmark — landing at 0%, well below Opus 4.8 by design. Anthropic ships Sonnet 5 with the same cyber safeguards enabled by default as Opus 4.7 and 4.8.

Should You Use Sonnet 5 or Opus 4.8?

If you...Decision
Do everyday knowledge work or document analysis✅ Sonnet 5. Beats Opus on GDPval. 60% cheaper.
Run agentic coding with terminal/browser tool use✅ Sonnet 5. 97% of Opus on TB 2.1, 97% on OSWorld.
Need max reasoning on the hardest problems (no tools)🔶 Opus 4.8. 49.8% vs 43.2% on HLE without tools.
Resolve complex multi-file GitHub issues🔶 Opus 4.8. 69.2% vs 63.2% on SWE-bench Pro.
Run high-volume production workloads✅ Sonnet 5. 1.7× cheaper, faster latency. Same 1M context.
Are migrating from Sonnet 4.6✅ Upgrade immediately. Double-digit gains on half the benchmarks.
Are cost-sensitive but need near-frontier quality✅ Sonnet 5 at introductory $2/$10 through Aug 31. Unbeatable value.
Need proven stability for critical infrastructure🔶 Opus 4.8. 5 weeks of production hardening vs days-old release.

Conclusion: The Sonnet That Ate Opus's Lunch

Claude Sonnet 5 is the closest a Sonnet-tier model has ever come to matching the concurrent Opus flagship. On knowledge work it wins outright. On HLE with tools it ties. On agentic coding and computer use it's within 2–3 points. Only on raw reasoning without tools and the hardest SWE-bench Pro tasks does Opus maintain a clear edge.

At $3/$15 (and $2/$10 introductory through August 31), Sonnet 5 is 1.7× cheaper than Opus 4.8. For the vast majority of professional use cases — knowledge work, agentic coding with tools, desktop automation, document analysis — Sonnet 5 delivers near-flagship quality at Sonnet prices. The $25/1M Opus premium is now only justified for the hardest reasoning tasks, the most complex GitHub issues, and workloads where a 6-point Pro accuracy gain translates to real business value.

Anthropic's official line: "For many developers, the agentic AI era began with Sonnet-class models... More recently, though, the clearest gains in agentic capabilities have been in our Opus-class models." Sonnet 5 is Anthropic closing that gap. The agentic era is back in the Sonnet tier — at Sonnet prices.

🔬 Side-by-Side Test

Run Claude Sonnet 5 and Opus 4.8 on your own code. See the 93%-capability-at-60%-cost advantage in practice. Sandboxes stay alive even when you close your laptop.

🔄 Compare Side by Side →

Sources & Links

Read This Next