Product Launch · Xiaohu Explains

Anthropic Ships Claude Sonnet 5: 40% Cheaper, Matching Opus 4.8 on Some Tasks

Official benchmarks show that at high effort levels it matches Opus 4.8 on some tasks — yet its standard price is only 60% of Opus's.

At a glance

Anthropic has released Claude Sonnet 5, calling it the most capable Sonnet model yet at running tasks autonomously (agentic).
Promo pricing is $2 input / $10 output per million tokens (through August 31, 2026), then rising to $3 / $15; for comparison, the flagship Opus 4.8 is priced at $5 / $25.
Live today across the Free, Pro, Max, Team, and Enterprise plans, plus Claude Code and the Claude Developer Platform — and it's the default model on the Free and Pro plans.
Safety testing shows a lower overall misbehavior rate than the previous Sonnet 4.6, but its cyberattack ability (e.g. developing software exploits) is markedly weaker than Opus 4.8, and Anthropic turned on real-time cybersecurity protections by default.
It uses a new tokenizer, so the same text may be cut into more tokens (about 1.0 to 1.35×); the promo pricing already factors this in, making the upgrade roughly cost-neutral once you do the math.

⚑ Editorial note: all data here comes from Anthropic's official launch page, its own benchmarks, and the system card — i.e., the vendor's own account. The performance comparisons, safety scores, and success rates below are all official or self-reported figures; we relay them faithfully without flagging each one's accuracy.

1 In one sentence

With this launch, the cheap one catches up to the expensive one

Anthropic has released Claude Sonnet 5, calling it the most capable Sonnet model to date and the best at running tasks autonomously (agentic — meaning the model can break a task down on its own, call tools like a browser and a terminal, run through several steps in a row, and proactively check its own work along the way).

Here's the counterintuitive part: Sonnet 5's standard price is only 60% of the flagship Opus 4.8, yet official benchmarks show that once you turn the effort level up, it can match Opus 4.8 on some tasks.

📌

Why it matters: per million output tokens, Sonnet 5's standard price is $15 and Opus 4.8 is $25 — exactly 60%; the promo price drops even lower, to $2/$10 input/output. And on BrowseComp (agentic search) and OSWorld-Verified (computer use), turning the effort up lets Sonnet 5 tie with Opus 4.8. 'Cheaper' and 'reaches the flagship' land on the same Sonnet for the first time.

← Cheaper · low effortPricier · flagship effort →

Sonnet 4.6

Sonnet 5

Opus 4.8

Illustration: Opus 4.8 sits at a fixed high point. Sonnet 4.6 only covers a short low stretch and tops out early; Sonnet 5 stretches its band long via effort levels, with its right end closing in on Opus 4.8. The same money spent on Sonnet 5 buys a wider band of intelligence.

Claude Sonnet 5 benchmark comparison table

Official benchmark comparison: Sonnet 5 vs. the previous Sonnet 4.6, plus the more all-round Opus 4.8 (for reference). Full benchmarks in the Claude Sonnet 5 system card. Source: Anthropic's website.

2 The backstory

The models that get work done have always come from the Sonnet line first

For many developers, the whole 'AI that gets work done on its own' trend started with Sonnet: Claude Sonnet 3.5, 3.6, and 3.7 were among the first models to turn heads at writing code and calling tools. But lately the sharpest gains have come from the pricier Opus line, and the Sonnet track fell behind. What Sonnet 5 sets out to do is close that gap.

3.5

First to show
agentic ability

3.6

3.7

4.6

Opus pulls
ahead

Closes
the gap

Compared with the previous Sonnet 4.6, Anthropic says Sonnet 5 shows clear gains across the key areas tied to agentic performance: reasoning, tool calling, coding, and knowledge work.

3 Bang for the buck

How much intelligence a dollar buys now

Anthropic released two cost-performance curves comparing Sonnet 5, Sonnet 4.6, and Opus 4.8 across effort levels — the x-axis is cost per task, the y-axis is benchmark score. The takeaway: Sonnet 5 (orange line) beats Sonnet 4.6 (gray line) across the board, covers a far wider cost range than Opus 4.8 (yellow line), delivers a clear value jump at mid effort, and matches Opus 4.8 on some tasks at the top level.

BrowseComp (agentic search) OSWorld-Verified (computer use)

Sonnet 5 Sonnet 4.6 Opus 4.8

Illustrative curve: as the effort level rises, Sonnet 5's score keeps climbing, nearing Opus 4.8 at high levels, and covers a far wider cost range than Sonnet 4.6. See the official chart below for exact figures.

Illustrative curve: same story on computer-use tasks — a clear cost-efficiency jump at mid level, matching Opus 4.8 on some tasks at high levels. See the official chart below for exact figures.

Cost-performance curves across effort levels

Official cost-performance curves: the previous Sonnet 4.6 clearly can't reach Opus 4.8; Sonnet 5 covers a wider cost range and matches Opus 4.8 on some tasks. Sonnet 5 here is priced at the standard $3/$15; at the promo $2/$10 the real cost is lower. xhigh = the top effort level. Source: Anthropic's website.

Two benchmark-methodology updates (June 30 correction)

Anthropic revised this launch post on June 30: the original BrowseComp chart used a simpler method that underestimated Sonnet 5, and it has now been redrawn with the standard method from the system card (10M token budget + compaction + programmatic tool calls). Two older scores were also corrected because the scoring method changed: Sonnet 4.6's Humanity's Last Exam score is updated to 34.6% (no tools) / 46.8% (with tools); Sonnet 4.6's OSWorld-Verified score is updated to 78.5%. These differ from the numbers in the Sonnet 4.6 launch blog precisely because the benchmarking method changed.

4 How it works

Spend more compute, think one step further: how one model is both cheap and top-tier

Sonnet 5 can span such a wide price range thanks to a mechanism called effort (a compute/reasoning-intensity level): with the same model, you choose 'how hard it thinks.' Lower levels are cheaper and faster but may be less careful; higher levels spend more compute reasoning over and over and double-checking itself, giving more accurate answers but at greater cost and slower speed.

An analogy

It's like ordering the same dish at the same restaurant: you can have the chef cook it as usual, or pay extra to have him take more care and taste it himself before serving to make sure it's right. It's the same dish; what changes is how much care he puts in and how many times he checks. The effort level dials exactly that 'level of care.'

low

Cheap, fast
may miss details

medium

Best value
strong cost efficiency

high

Harder reasoning
more accurate, steadier

xhigh

Top level
matches Opus 4.8 on some tasks

Taller bar = harder thinking = more accurate, but more compute (money and time) too. xhigh means extra high — the top level.

Core innovation

In the past, more capability meant switching to a bigger, pricier model. Now you don't switch models — you just turn a knob: low levels are the cheap, fast entry tier; the high level (xhigh) spends more compute reasoning and self-checking, matching the flagship Opus 4.8 on some tasks. A single Sonnet 5 fills the entire price band from entry-level to near-flagship in one go, instead of topping out early like Sonnet 4.6. Where to strike the balance between cost and performance is left for you to decide per project.

5 Early feedback

Early users say: no nudging needed — it checks its own homework

Anthropic says feedback from early-access partners was fairly consistent: Sonnet 5 is noticeably better at 'getting work done on its own' than earlier generations. Here are a few of the testers' observations, plainly stated.

On complex tasks, it keeps going until they're done, whereas earlier Sonnets often stopped halfway.
Even when no one explicitly asks, it proactively checks whether its own output is correct.
And the price for doing this kind of autonomous work is quite attractive.

6 Safety assessment

Safer overall — but its cyberattack ability was deliberately held down

Pre-deployment safety testing shows Sonnet 5 is safer overall than Sonnet 4.6: better at refusing malicious requests, more resistant to prompt injection (where an attacker secretly plants malicious instructions into the web pages or emails the model processes, trying to hijack it into serving the attacker rather than the user), and less prone to hallucination and sycophancy. In an automated behavioral audit spanning many types of misbehavior, it scores lower overall (i.e., safer) — but still higher than the stronger Opus 4.8 and Claude Mythos Preview.

Sonnet 4.6

Sonnet 5

Opus 4.8

Mythos Preview

Relative illustration (longer bar = higher misbehavior rate = less safe): Sonnet 5 is below Sonnet 4.6 but above Opus 4.8 and Mythos Preview. Lengths are a relative ranking, not exact values — see the official chart below for specifics.

Misbehavior rates by model in Anthropic's automated behavioral audit: Sonnet 5 is lower overall than Sonnet 4.6 (safer) but higher than Mythos Preview and Opus 4.8. Full list in system card section 6.4. Source: Anthropic's website.

Cybersecurity is the one area held down on its own. Anthropic says it did not specifically train Sonnet 5 on cybersecurity tasks: it can handle routine, harmless network tasks, but on potentially harmful benchmarks like developing software exploits, it performs markedly worse than Opus 4.8 and Mythos 5.

7 Field test · Firefox

A concrete test: can it break into Firefox

'Weak cyberattack ability' sounds abstract, so Anthropic gave concrete numbers: have each model develop an exploit for vulnerabilities in the Firefox browser. This benchmark was built jointly by Anthropic and Mozilla, and all the vulnerabilities involved have already been fixed in Firefox 148.

0.0%

Sonnet 5's success rate at fully building a working exploit

0.0%

Sonnet 4.6's full success rate — tied with Sonnet 5

Neither Sonnet can build a single complete, working exploit (both at 0.0%). Sonnet 5 is only slightly higher than Sonnet 4.6 on partial success rate, which Anthropic reckons mostly spills over from stronger general intelligence rather than dedicated training. For comparison, both Opus 4.8 and Mythos 5 have far stronger cyberattack ability than either Sonnet.

Model scores on the Firefox 147 exploit benchmark

Firefox 147 exploit benchmark (built jointly by Anthropic and Mozilla; the vulnerabilities have been fixed in Firefox 148): for each model, the left bar = success rate for a complete, working exploit, the right bar = partial success rate. Both Sonnets are at 0.0% on full success, with Sonnet 5 slightly above Sonnet 4.6 on partial; both are far below Opus 4.8 and Mythos 5. See system card section 3.2.4. Source: Anthropic's website.

Because Sonnet 5 is slightly stronger than its predecessor on these tasks, Anthropic turned on real-time cybersecurity protections by default — detecting and blocking dangerous network uses in real time, at the same tier as Claude Opus 4.7 and 4.8. Anthropic judges Sonnet 5's overall cybersecurity risk to be low, so these protections are more lenient than Fable 5's (which block a far wider range of cybersecurity tasks).

8 The pricing catch

Looks like a price cut — really it's a new measuring stick

Sonnet 5 switched to a new tokenizer. Before processing text, the model first cuts it into tokens for billing and computation. With the new tokenizer, the same text may be cut into more tokens — about 1.0 to 1.35× (depending on content type). In other words, the per-token price dropped, but the same passage now burns more tokens, so the real unit cost didn't fall as much as it looks.

Old tokenizer (Sonnet 4.6)fewer tokens

Thesamepieceoftextprocessed

New tokenizer (Sonnet 5)more tokens (~1.0 to 1.35×)

Thesamepieceoftextprocessed

Illustration: the splits shown are for demonstration only, not real token boundaries. The same passage is cut finer under the new tokenizer, so the token count goes up.

Price alone

Per million tokens, it drops from Sonnet 4.6's price to the promo $2/$10.

Counting the extra tokens

The promo pricing was set precisely to offset the tokenizer change, making the move from Sonnet 4.6 to Sonnet 5 roughly cost-neutral.

Anthropic says outright: the promo pricing is set so that this upgrade works out to be roughly cost-neutral. That's why 'price cut' belongs in quotes — you have to do the math with the token count included. This tokenizer change is the same kind of move as the one with Claude Opus 4.7.

9 Available now

Available now: where it's live, the price table, and how to choose

Sonnet 5 goes live across all plans today: it's the default model on the Free and Pro plans, and available to Max, Team, and Enterprise users too; it also launches on Claude Code and the Claude Developer Platform, and developers can call claude-sonnet-5 via the Claude API.

Model	Input / Output (per million tokens)
Sonnet 5 (promo, through 2026-08-31)	$2 / $10
Sonnet 5 (standard price after)	$3 / $15
Opus 4.8 (for comparison)	$5 / $25

Now
$2 / $10
promo

→

From 2026-09-01
$3 / $15
standard

To accommodate the higher token consumption that comes with higher effort levels, Anthropic has raised the rate limits across Chat, Cowork, Claude Code, and the Claude Developer Platform, so you can pick the right level per project.

How to choose

Who you are	Recommendation
Developers	Want stronger agentic coding and tool calling on the same budget: use a high level. Want to save money: dial effort down to get near-flagship results at lower cost, and find your own balance between cost and performance.
Enterprise / Teams	Rate limits on Chat, Cowork, Claude Code, and the Developer Platform have been raised to match the higher token consumption at high levels.
Security work	Default cybersecurity protections match Opus 4.7/4.8; if you need less-restricted cybersecurity research or offense/defense work, Anthropic recommends using Opus 4.8 rather than Sonnet 5.

Sonnet 5 narrows the gap: its performance is close to Opus 4.8, but at a lower price. Anthropic, "Introducing Claude Sonnet 5"

This piece is compiled from Anthropic's "Introducing Claude Sonnet 5" (including the June 30, 2026 correction) and the Claude Sonnet 5 system card. The benchmark scores, success rates, and pricing herein are all Anthropic's official and self-reported figures; some charts are the original official images, and illustrations/illustrative rankings are labeled as such. Actual performance depends on your own use.