Claude 4 airborne! AI programming true god debut, continuous 7 hours of autonomous programming, writing code efficiency pull full

1,756 0

Early this morning, U.S.-based big modeling unicorn Anthropic officially unveiled its next-generation at its inaugural developer conferenceClaudeModels: the Claude Opus 4 and the Claude Sonnet 4, the first major version number update since June 2024 for Claude.

Anthropic calls Claude Opus 4 "the world's best programming model" that delivers consistent performance across complex, long-running tasks and intelligent workflows. reasoning capabilities at its core, while responding more accurately to user prompt words. Both models are hybrid models, offering two modes: immediate response and extended thinking for deeper reasoning.

On the authoritative programming benchmark SWE-bench Verified, Claude Opus 4 and Claude Sonnet 4 with extended thinking turned on scored 79.41 TP4T and 80.21 TP4T, respectively, substantially outperforming OpenAI Codex-1, OpenAI o3, OpenAI GPT-4.1, Gemini 2.5 Pro and other models.

The two models outperformed OpenAI o3 in benchmarks in programming, tool use, visual reasoning, and math, while Claude Opus 4 scored equal to OpenAI o3 on multilingual quizzing, and graduate-level reasoning tasks. The new models have seen an upgrade in their intelligentsia capabilities, with up to 7 hours of standalone operation, and new features such as file APIs and cue word caching.

Pro, Max, Team and Enterprise Claude subscribers have access to the two Claude models mentioned above and their extended thinking modes, and Sonnet 4 is available to free subscribers.

Both models can be called on the Anthropic API, Amazon Bedrock, and Vertex AI for Google Cloud, and the pricing is consistent with the previous Opus and Sonnet models: $15/$75 per million tokens (input/output) for Claude Opus 4, $3/$75 for Claude Sonnet 4 at $3/$15.

Anthropic also concurrently released Claude Code, an AI programming assistant that taps into the Claude Opus 4 model to map and interpret million-line code bases in real-time.Claude Code integrates with GitHub, GitLab, VS Code, the JetBrains IDE, and command-line tools, and can be directly embedded into development terminals. The programming assistant is available in three subscription plans: volume-based, $100 per month, and $200 per month.

01.The opening 2 minutes dumped a heavy new model thatClaude will be updated more frequently in the future

At the Code with Claude developer conference, Anthropic founder Dario Amodei walked briskly onto the stage at the 2-minute mark of the opening session and, without any preamble, dumped the biggest surprise of all - the release of the Claude 4 series of models.

Amodei said Anthropic hasn't updated the Opus model in a while, and that Anthropic positions Opus as the most powerful and intelligent model in its portfolio, while Sonnet is the mid-level model that users have been using for the past year or so.

The Claude Opus 4 did not score significantly higher or even slightly lower than the Claude Sonnet 4 on a number of benchmarks.Amodei emphasizes that for a large model like the Claude Opus 4, benchmarks are not a complete reflection of its capabilities.

In previews Anthropic has provided to customers, the Claude Opus 4 can autonomously perform tasks within Anthropic that would normally take a human 6 to 7 hours, and some of the most senior engineers within Anthropic have been amazed at the productivity gains it brings.

Claude Sonnet 4 performs on par with Claude Opus 4 in a number of programming benchmarks, but the former is more streamlined and focuses on specific tasks such as programming.Claude Sonnet 4 also addresses a variety of issues that have arisen from the real-world use of Claude Sonnet 3.7 including over-zealousness (the tendency to do more than the user requires) and reward mechanism issues.

Amodei says that Anthropic will continue to improve the Claude series of models by releasing possible minor updates on a regular basis, ideally more often than previously released.

02.Intelligent body capabilities are upgraded.Up to 7 hours of stand-alone operation

Mike Krieger, chief product officer at Anthropic and co-founder of Instagram, shared more about Claude 4 in detail.

According to Krieger, Claude Opus 4 excels at understanding the code base and planning additions, and is very efficient and accurate in terms of migrations, code refactoring, and workflows for even the most complex intelligences.

Claude Sonnet 4 excels at everyday coding tasks, application development and pair programming. It is also suitable for high-traffic use cases, balancing efficiency and performance, and can be considered a "24/7" coding partner.

Claude 4 Series models have been upgraded with key new features for building intelligences that can use tools. Claude 4 Series models can now process multiple tools in parallel, and when granted access to local files, they can even hold memories between sessions, building knowledge over time.

Krieger recalled that shortly after joining Anthropic, they had successfully completed the prototype of Amazon's Alexa voice assistant with only a three-person team with Claude's help, and Krieger, an "ex-engineer," had returned to code himself. This cooperation eventually made Claude become one of the core models of Alexa Plus.

This experience reinforced Krieger's belief in the potential of AI collaboration. Today, AI is not just a tool, but a truly intelligent collaborative partner that continues to push the boundaries of technology.Krieger presents what Anthropic sees as the three core capabilities of an ideal intelligence:

(1)situational intelligence: Understand the organizational context, optimize performance through experience, and get better the more you use it like a good employee;

(2)Long-term implementation: Handles complex tasks independently for hours, intelligently coordinates resources;

(3)In-depth Collaboration: Interact naturally, adapt to work styles, and be transparent in decision-making.

To realize these three major capabilities, Anthropic has introduced more new upgrades.

Instead of just writing code, Claude can now run code through new code execution tools on the Anthropic API, capable of loading datasets, cleaning data, generating exploratory charts, and analyzing anomalies in real time. When combined with the Claude 4 model, the code execution tool can handle complex tasks and save significant time.

The autonomy of the Claude 4 series models has been further improved, with the Claude 3.7 running autonomously for up to 45 minutes and the Claude 4 running independently for hours, up to 7 hours. The new models maintain memory by managing a to-do list without losing the trail.

Anthropic emphasizes that the widespread adoption of intelligences requires improved model judgment about confidential content, decision-making, and coordination. Today, every function of the Claude model includes architectural security checkpoints and controls to ensure the reliability of the model in a production environment.

Intelligent bodies also need access to real-world information and connectivity to existing systems to get off the ground, and to help them expand further, Anthropic has introduced four new interconnected features.

First, developers can now link the MCP protocol (Model Context Protocol) directly through the Anthropic API. Today, the MCP protocol is used by Microsoft, Google, OpenAI, Block, Atlassian, Zapier, Linear, and many others, and Anthropic believes that MCP promises to lay the groundwork for the intelligent body economy.

Second, Web Search provides Claude with real-time access to current information. This is an intelligent data enhancement that allows Claude to analyze current events, market trends, and emerging technologies, and is also very powerful when used in conjunction with the MCP feature.

Third, the File API is available today in the Anthropic API. The File API allows Claude to read and write memory files to maintain contextual continuity during long tasks, and Anthropic is releasing a companion "Recipe for Memory Functionality" to guide developers on how to integrate it into their apps.

Finally, the cue caching feature receives an upgrade, with the TTL (time to live) raised from 5 minutes to 1 hour, which reduces the model's cost of use by up to 901 TP4T and latency by 851 TP4T, making it particularly suitable for long cue-word scenarios, long-running intelligent body workflows, and repetitive tasks that require frequent invocations of the same context.

Anthropic also significantly reduces the behavior of Claude 4 models using shortcuts or exploits to complete tasks. The likelihood of this behavior is reduced by 65% compared to Sonnet 3.7.

The Claude Opus 4 also significantly outperforms all previous models in terms of memory capabilities. When developers build applications that provide access to Claude's local files, Opus 4 can skillfully create and maintain "memory files" to store critical information. This unlocks better long-term task awareness, consistency, and performance in proxy tasks - such as creating a "navigation guide" when playing Pokémon on Opus 4.

The Claude 4 model also introduces a Thought Chain Summarization feature, which allows you to use smaller models to condense lengthy thought processes.

03.Programming Assistant is fully open.Integration into mainstream development platforms

Claude Code celebrates full openness today, moving from a research preview to a full-fledged product.Cat Wu, Claude Code Product Manager, shares that in addition to accessing the latest models, Claude Code has introduced several new features.

Claude Code is now integrated with major IDEs such as VS Code and JetBrains, and in the process, developers can view code changes suggested by Claude Code in real time directly in the editor.

Anthropic has also released the Claude Code SDK, which allows developers to incorporate Claude Code as a building block into their applications and workflows. To demonstrate its potential, Anthropic has open-sourced a sample project on GitHub: users can @Claude directly in Pull Requests and Issue, and it will automatically respond to review comments, fix bugs, and add new features.

With these updates, Claude Code has been able to cover most work scenarios, whether it's deep development in the terminal, handling remote collaboration on GitHub, building automated workflows through the SDK, or code review in the IDE.

During the demo session, Claude Code quickly completed the development of a form component for the open source tool Excalidraw, creating a task list, exploring the codebase and generating code, running tests, and submitting Pull Requests, and automatically updating documentation through GitHub Actions. In just 10 minutes, Claude Code accomplishes complex tasks that would normally take hours, dramatically improving development efficiency.

Claude Code and GitHub Actions, powered by the Claude Code SDK, are now available and can be installed simply by running a command. VS Code and JetBrains Web IDE extensions are also available in beta and can be installed by running a command in the IDE.

04.Conclusion: Two Scaling Law paths continue to be effective, theThe next 1 year will witness a programming revolution

According to Amodei, the fact that Claude Sonnet 3.7 was released only two and a half months ago, but it feels like a year has passed, shows how fast the AI field is evolving. He emphasized that Claude 4's model capabilities come from the joint advancement of pre-training and post-training, with pre-training Scaling Law still in effect and post-training techniques evolving in tandem to complement each other.

Looking to the future of AI, Amodei believes that in the coming year we will witness a revolution in programming, starting with Claude Code, and moving into the era of the "fleet of intelligences", where batches of intelligences will automate software development, and the cost of customized software will be dramatically reduced, which will reshape the roles of developers, enterprises and startups. This will reshape the role of developers, enterprises and startups.

artifact # Claude

The copyright of the article belongs to the author, please do not reprint without permission.

Apple Opens Source for New Model SHARP! Turn Photos into 3D Worlds in Seconds

artifact # 3D generation # Apple

NVIDIA launches world's first open-source quantum AI model to help develop quantum chips

artifact # NVIDIA

百度新开源模型PaddleOCR-VL一夜登顶，识别109种语言，综合分全球第一

Baidu's new open-source model PaddleOCR-VL tops the charts overnight, recognizing 109 languages with the world's top overall score

artifact # Baidu

Domestic video generation breakthrough again! From movie and television short films to ocean-going family connections, AI makes the sky a stone's throw away!

artifact # AI Video Generation # TeleAI

No comments

No comments...

Claude 4 airborne! AI programming true god debut, continuous 7 hours of autonomous programming, writing code efficiency pull full

01.The opening 2 minutes dumped a heavy new model thatClaude will be updated more frequently in the future

02.Intelligent body capabilities are upgraded.Up to 7 hours of stand-alone operation

03.Programming Assistant is fully open.Integration into mainstream development platforms

04.Conclusion: Two Scaling Law paths continue to be effective, theThe next 1 year will witness a programming revolution

OpenAI Releases Codex Intelligentsia, an Automated Software Programming Assistant Designed for Developers

Free Sora! Microsoft Releases Bing Video Creator

Related posts

Apple Opens Source for New Model SHARP! Turn Photos into 3D Worlds in Seconds

NVIDIA launches world's first open-source quantum AI model to help develop quantum chips

Baidu's new open-source model PaddleOCR-VL tops the charts overnight, recognizing 109 languages with the world's top overall score

Domestic video generation breakthrough again! From movie and television short films to ocean-going family connections, AI makes the sky a stone's throw away!

No comments

Popular Articles

Popular Sites