Claude Sonnet 4.5 comes out strong: Programming ability upgraded again, over 30 hours of autonomous code writing

1,216 0

Today, Claude Sonnet 4.5 was officially released, a model that can sustain focused processing of complex multi-step tasks for over 30 hours inprogramAbility, Computer Operations topped the list, surpassing the GPT-5 in a variety of areas, including reasoning, math, and intelligent body programming.

Claude Sonnet 4.5强势登场：编程能力再升级，自主写代码超30小时

Claude Sonnet 4.5 billing rates remain the same as Claude Sonnet 4, i.e., $3 (about Rs. 21.4) per million tokens input and $15 (about Rs. 106.8) per million output.

Claude Sonnet 4.5强势登场：编程能力再升级，自主写代码超30小时

In addition, Claude Code has added a new checkpoint feature that allows users to save progress and supports instant rollback, and Anthropic has updated its terminal interface and released a native VS Code extension plugin.

AnthropicThe Claude Agent SDK, the core component of Claude Code, is also open to developers, enabling users to directly utilize the underlying architecture that underpins their products for secondary development.

The Claude API also adds contextual editing and memorization tools that can help intelligences continue to handle more complex tasks. Code execution and file generation (spreadsheets/slideshows/documents) have been seamlessly integrated into the dialog flow in the Claude app.

The above features are available in open public beta today on Claude Developer Platform, Amazon Bedrock and Google Cloud Vertex AI.

I. Capable of handling over 30 hours of tasks, Claude Sonnet 4.5 surpasses the GPT-5

Claude Sonnet 4.5 is firmly at the top of the list in the SWE-bench Verified review, which measures real programming ability. Real-world tests show that the model can sustain focused processing of complex multi-step tasks for more than 30 hours.

Claude Sonnet 4.5强势登场：编程能力再升级，自主写代码超30小时

Claude Sonnet 4.5 took first place in the OSWorld Benchmark Evaluation, which tests the real-world computer-operating capabilities of AI models, with a score of 61.41 TP4T, compared to 42.21 TP4T for Sonnet 4 four months ago.As demonstrated in the demo below, Claude runs directly in a browser environment and automates the entire process of website navigation, form filling and automate the entire process of website navigation, form filling and task execution.

The model outperforms the GPT-5 on a number of assessments, including reasoning, math, and intelligent body programming:

Claude Sonnet 4.5强势登场：编程能力再升级，自主写代码超30小时

Experts from the fields of law, finance, medicine, and STEM confirm that Claude Sonnet 4.5 makes significant advances in specialized domain knowledge acquisition and reasoning skills compared to older models (including Opus 4.1).

Claude Sonnet 4.5强势登场：编程能力再升级，自主写代码超30小时

Anthropic claims that Claude Sonnet 4.5 is not only the most powerful model, but also their most values-aligned cutting-edge AI system to date. By leveraging the model's increased capabilities and in-depth safety training, the team improved Claude Sonnet 4.5's behavioral patterns, effectively reducing undesirable tendencies such as flattery, deception and concealment, the pursuit of power, and the promotion of delusional thinking.

Claude Sonnet 4.5强势登场：编程能力再升级，自主写代码超30小时

▲ Overall misbehavior score in the Automated Behavior Audit System, with lower values being better. Misbehaviors include, but are not limited to, deception, sycophancy, power chasing, promoting delusions, and obeying harmful system commands.

Second, the introduction of native VS Code extension plug-ins, Claude Code intelligent body capacity upgrade

Claude Code has also introduced several upgrades: a native VS Code extension plug-in, version 2.0 of the terminal interface, and support for checkpointing features that run autonomously.

The native VS Code beta extension embeds Claude Code directly into your IDE. With an exclusive sidebar panel and in-line diff comparison, users can view code changes made by Claude in real-time. This extension provides a richer, more visual Claude Code experience than the terminal for those who prefer IDE development.

Claude Sonnet 4.5强势登场：编程能力再升级，自主写代码超30小时

Claude Code's terminal interface has also received an update, with a new version that improves status visualization and adds a searchable command history.

Claude Sonnet 4.5强势登场：编程能力再升级，自主写代码超30小时

The Claude Agent SDK (formerly Claude Code SDK) opens up the core tools, context management system, and permissions framework that drive Claude Code for teams that need to build customized intelligences for their workflows.The Claude Agent SDK adds SDK support for sub-intelligences and hooks, giving developers the flexibility to build intelligences that fit into specific developers can more flexibly build intelligences for specific workflows.

Claude Sonnet 4.5强势登场：编程能力再升级，自主写代码超30小时

As the tasks Claude Code takes on become more complex, the checkpoint feature allows users to feel more comfortable delegating tasks to Claude Code while maintaining control.

The new checkpointing system automatically saves the state before each code change, allowing users to instantly rewind to a historical version by simply double-clicking the Esc key or using the /rewind command.

When backing out of a checkpoint, the user can choose to restore the code, dialog records or both (Note: checkpoints only record Claude's editing operations, and do not contain user editing or bash commands, so it is recommended to use them together with a version control system).

Third, performance improvement of 39%, Token saving of 84%, Claude developer platform context management function upgrade

The Claude Developer Platform introduces two new context management features: context editing and memorization tools.

As intelligences in production environments process increasingly complex tasks and generate a large number of tool call results, they tend to exhaust the effective context window, forcing developers to face the dilemma of truncating dialog records or sacrificing performance.

Context editing automatically cleans up stale tool calls and results in the context window when the token capacity is approaching its upper limit. As the intelligences continue to perform tasks and accumulate tool call records, this feature can significantly extend the autonomy of intelligences by removing obsolete content while maintaining the integrity of the dialog flow, and it can also improve the actual performance of the model by focusing on the core contexts.

Claude Sonnet 4.5强势登场：编程能力再升级，自主写代码超30小时

Memory tools, on the other hand, allow Claude to store and recall information outside of the context window through a file-based system.Claude can create, read, update, and delete files in a memory directory dedicated to the user's infrastructure, and these files will persist across multiple conversations.

This feature allows intelligentsia to build knowledge bases incrementally, maintain item status across sessions, and refer to historical learning outcomes without having to cram everything into the context window.

Memory tools are run entirely through client-side tool calls, allowing developers to manage the storage back-end independently, thus taking full control of where data is stored and how it is persisted.

Claude Sonnet 4.5 can dynamically track the number of available tokens throughout the entire conversation with its built-in context-awareness capabilities, support longer conversations by automatically cleaning up stale tool results in the context, and continuously improve the accuracy of responses by storing key information in memory and passing it on across conversations.

Claude Sonnet 4.5 can handle complete code bases, analyze hundreds of documents, and maintain extensive tool interaction histories. Context management builds on this foundation to ensure that intelligentsia can both efficiently utilize extended capacity and handle workflows that exceed fixed limits.

In the Intelligent Body Search internal evaluation, the research team also tested the effectiveness of context management in enhancing complex multi-step tasks. The results showed that the combination of the memory tool and contextual editing improved performance by 391 TP4T over baseline performance, and contextual editing alone improved performance by 291 TP4T.

In 100 rounds of web search tests, contextual editing enabled the intelligences to successfully complete tasks that would have otherwise failed due to context exhaustion, while reducing token consumption by 841 TP4T.

Conclusion: Anthropic further refines its smart body development ecosystem

This release is a series of upgrades to Anthropic from the underlying model to the toolchain.

At the model level, Claude Sonnet 4.5 is able to focus on complex tasks for more than 30 hours on a continuous basis, a capability that opens up more possibilities for automating processes with long lead times and multiple steps.

Anthropic has built an ecosystem for intelligent body development through the upgrading of Claude Code, the opening of the Agent SDK, and the introduction of context management features that together address a pain point in intelligent body development: how to handle complex real-world tasks within a limited window.

artifact # Programming

The copyright of the article belongs to the author, please do not reprint without permission.

OpenAI Releases AI Browser ChatGPT Atlas, Challenging Chrome Supremacy

artifact # OpenAI

谷歌推出超小型AI模型Gemma 3 270M！手机能跑，智能设备离线运行新突破

Google launches ultra-small AI model Gemma 3 270M! Cell phones can run it, a new breakthrough for smart devices running offline!

artifact # Gemma 3

Claude Sonnet 4.5 comes out strong: Programming ability upgraded again, over 30 hours of autonomous code writing

I. Capable of handling over 30 hours of tasks, Claude Sonnet 4.5 surpasses the GPT-5

Second, the introduction of native VS Code extension plug-ins, Claude Code intelligent body capacity upgrade

Third, performance improvement of 39%, Token saving of 84%, Claude developer platform context management function upgrade

Conclusion: Anthropic further refines its smart body development ecosystem

Tencent Mixed 3D-Omni, Mixed 3D-Part released and open source: 3D generation into the era of accurate and controllable

OpenAI Launches Sora 2, AI-Generated Video Upgraded, "AI Version of Jitterbit" Opens New Creative Experience

Related posts

OpenAI Releases AI Browser ChatGPT Atlas, Challenging Chrome Supremacy

Google launches ultra-small AI model Gemma 3 270M! Cell phones can run it, a new breakthrough for smart devices running offline!

Gemini 3 Flash Makes Its Grand Debut: Blazing Speed and Intelligence Surpassing Pro, Ushering in a New Chapter for AI

Tencent ima 2.0 released, one sentence search millions of knowledge bases, 200 million pieces of knowledge

No comments

Popular Articles

Popular Sites