The Claude Sonnet 5 is designed to be the most efficient Sonnet yet. It can make plans, use tools like browsers and terminals, and operate autonomously at a level that would have required larger, more expensive models just a few months ago.
For many developers, the era of agent-based AI began with the Sonnet-class models: Claude Sonnet 3.5, 3.6 and 3.7 were the first models to demonstrate impressive programming and tooling skills. However, recently the most obvious gains in agent capabilities have been achieved in our Opus class models.
The Sonnet 5 closes the gap, offering similar specs to the Opus 4.8 but at a lower price. This is a significant improvement over its predecessor, Sonnet 4.6, in important aspects of agent performance such as reasoning, tool use, coding, and intelligence:
Sonnet 5 scores by various estimates compared to Sonnet 4.6 and Opus 4.8 scores (a more general model for reference). The Claude Sonnet 5 system map details a broader set of assessments.
Our security assessments found that Sonnet 5 exhibits generally lower levels of unwanted behavior than Sonnet 4.6 and is generally safer to use in agent-based contexts. Ratings also show that its ability to perform cybersecurity tasks is much lower than our current Opus models.
Starting today, Claude Sonnet 5 is available on all plans: it is the default model for the Free and Pro plans, and is available for Max, Team, and Enterprise users. It is also available on Claude Code and the Claude Platform, where it is launching with an initial price of $2 per million input tokens and $10 per million output tokens until August 31, 2026, after which it will be priced at $3 per million input tokens and $15 per million output tokens. Developers can use claude-sonnet-5 via Claude’s API.
Working with Claude Sonnet 5
The charts below compare the performance of Sonnet 5, Sonnet 4.6, and Opus 4.8 at various effort levels in the BrowseComp agent-based search assessment and the OSWorld-Verified computer usage assessment. Sonnet 5 (orange line) is a significant improvement over Sonnet 4.6 (gray line). Opus 4.8 (yellow line) is still the model of choice for higher precision solving these problems, but Sonnet 5 provides developers with cheaper options that are of much higher quality than those previously available. Between Sonnet 5 and Opus 4.8, users can adjust the level of effort to find the right balance of cost and performance.
Feedback from our Early Access partners has been consistent: Sonnet 5 is much more active than its predecessors. Testers described how it handles complex tasks that previous Sonnet models couldn’t handle, how it checks its own output without being explicitly asked, and how it does all that agent work at an attractive price:
Claude Sonnet 5 provides our agents with a powerful execution layer for multi-phase software development work. It handles continuous coding, tooling, and debugging well in complex technical contexts, and is especially useful for workflows where follow-through and technical reasoning are important.
We tasked Claude Sonnet 5 with a two-part task: updating Salesforce account levels and sending a launch announcement to corporate contacts, and it was complete. Previously it stopped halfway. For everyday automation this is a no-brainer.
Claude Sonnet 5 allows you to do more with less. Same output quality, fewer steps to achieve that result. It also rejects unsafe requests clearly and consistently. At Lovable, we put powerful tools in the hands of millions of builders. A model that knows when to say no is just as important as a model that knows how to build.
We’ve used Claude Sonnet 5 for dozens of our most complex real-world pull requests, and each one produced a proven, proven result on its own, allowing our engineers to focus on judgment, decision making, and final approval.
I asked Claude Sonnet 5 to investigate the error. Without prompting, he wrote a replay test, implemented a fix, and then hid it to confirm that the bug had returned unchanged. All in one pass.
With Claude Sonnet 5, agents follow the plan, follow our agreements, and make clean, multi-step changes—all at an effective cost.
Claude Sonnet 5 has the best implementation of brownfield code – race conditions, hidden tests, parts no one wants to touch. It tracks the actual root cause of the failure and suggests a reliable fix instead of fixing the symptom.
Claude Sonnet 5 is on the Pareto frontier for Eve’s plaintiff defense problems. We see the most obvious benefits in legal research and analysis with value for money that has made the migration choice easier.
ClickHouse agents study data in real time and quickly produce analytical information, so the time to obtain information is of great importance when testing new models. Claude Sonnet 5 speaks more clearly and makes our users respond noticeably faster. This speed is the difference our customers experience.
At Pace, our computer-based agents manage insurance workflows—claims intake, FNOL, claims analysis—on systems our operations teams already use. Claude Sonnet 5 consistently takes the right actions and does them quickly, which is what real insurance work requires.
Security assessment
Our pre-deployment security assessments found Sonnet 5 to be an overall improvement over Sonnet 4.6. In terms of agent-based security, the model is better at rejecting malicious requests and resisting eavesdropping attempts from rapid penetration attacks. The model shows lower levels of hallucination and sycophancy than Sonnet 4.6. In our automated behavioral audit, which checks for a wide range of non-consensual behavior such as cooperation with abuse and deception, Sonnet 5 received lower scores overall (i.e. safer). However, in this evaluation it showed slightly higher levels of inconsistent behavior compared to the higher-performing Opus 4.8 and Claude Mythos Preview.
Level of inconsistent behavior in our automated behavioral audit, which checks for a very wide range of unwanted behavior in many situations and contexts (see section 6.4 of the Sonnet 5 system map for a complete list and results for each specific behavior). Sonnet 5 exhibits an overall lower level of inconsistent behavior than Sonnet 4.6, but higher than Mythos Preview and Opus 4.8.
We did not intentionally train Sonnet 5 on cybersecurity issues. It can perform some routine, benign cyber tasks, but when testing potentially dangerous cyber skills such as developing software exploits, it performs significantly worse than models like Opus 4.8 and Mythos 5. The results of one evaluation that tested the models’ ability to develop exploits for vulnerabilities in the Firefox browser are shown in the chart below. Sonnet 5 was never able to develop a full working exploit, but it shows a slightly higher level partial success than Sonnet 4.6. The latter change is likely due to improvements in general intelligence rather than specific training.
A score measuring the success of models in developing exploits for software vulnerabilities in Firefox 147 (this score was developed in collaboration with Mozilla; all vulnerabilities were fixed in Firefox 148). For each model, the bar on the left shows how often the model (without protections) developed a working exploit; the right bar shows how often the model was partially successful. Neither Sonnet model was able to successfully develop a working exploit (both scored 0.0%); Sonnet 5 showed a slightly higher partial success rate than Sonnet 4.6. Both Sonnet models have significantly worse cyber capabilities than the Opus 4.8 and Mythos 5. See section 3.2.4 of the Sonnet 5 system map for details.
Because Sonnet 5 is slightly better at these tasks than its predecessor, we launched it with cyber protection enabled by default. These security measures, which detect and block dangerous cyber usage in real time, are similar to those present in Claude Opus 4.7 and 4.8 (since we concluded that the overall level of cybersecurity risk from Sonnet 5 was low, the security measures are less stringent than those launched in Fable 5, which block a much wider range of cybersecurity tasks).1
Our full assessment of Sonnet 5 based on multiple security and feature evaluations is presented in the Claude Sonnet 5 system card.
Availability and prices
Today, Claude Sonnet 5 is available everywhere at an introductory price of $2 per million input tokens and $10 per million output tokens until August 31, 2026. It then moves to a standard price of $3 per million input tokens and $15 per million output tokens.2 We have increased the speed limits in Chat, Cowork, Claude Code and Claude Platform.3 to enable higher token usage at higher effort levels; users can choose the level that makes sense for their specific project.