May 23, 2025 11:30:00

Anthropic releases two models in the 'Claude 4' family, with improved coding and inference capabilities from the previous generation

Anthropic announced Claude Opus 4 and Claude Sonnet 4 , part of the Claude 4 family, a next-generation AI model, at its developer event 'Code with Claude' held on May 22, 2025. Both models are tuned to perform well in programming tasks and are suitable for writing and editing code.

Introducing Claude 4 \ Anthropic

https://www.anthropic.com/news/claude-4

Code with Claude Opening Keynote - YouTube

Claude Opus 4 is positioned as a model with the world's highest level of coding performance, and provides sustained high performance in complex, long-term tasks and agent workflows. Claude Opus 4 achieved excellent scores of 72.5% in SWE-bench Verified and 43.2% in Terminal-bench , and is capable of continuously executing intensive tasks that require thousands of steps for several hours. Anthropic promoted Claude Opus 4 as being particularly excellent in coding and complex problem solving, and is recognized as the technology that supports cutting-edge agent products.

On the other hand, Claude Sonnet 4 is a significant improvement over the previous model Sonnet 3.7, achieving a cutting-edge score of 72.7% in SWE-bench. Claude Sonnet 4 emphasizes the balance between performance and efficiency, and is characterized by its practicality for a variety of internal and external uses. It also has improved maneuverability that allows for more fine control over implementation.

Below is a table summarizing the results of various benchmarks for coding, reasoning ability, multimodal functions, and agent tasks for Claude Opus 4 and Claude Sonnet 4. Compared to the previous generation, Claude 4 has particularly improved scores for agent performance (agentic tool use) such as terminal operations and command line work, and reasoning ability for solving mathematics (high school math competition).

TechCrunch, an IT news site, noted the above results, saying, 'In SWE-bench Verified, which evaluates the coding ability of the model, Claude Opus 4 outperforms Google's Gemini 2.5 Pro, OpenAI o3, and GPT-4.1, but it does not outperform OpenAI o3 in

MMMU , a multimodal evaluation, or GPQA Diamond, which covers doctoral-level biology, physics, and chemistry-related questions.'

Claude 4 is designed as a hybrid model with two modes: immediate response and extended thinking for deeper reasoning. In particular, the feature 'extended thinking with tool use' allows Claude to provide higher quality answers by alternating between reasoning and using tools such as web searches.

While traditional AI models generate answers instantly when asked a question, Augmented Thinking allows Claude to take the time to 'think' for deeper, more complex reasoning, similar to the way humans pause to organize their thoughts when faced with a difficult problem.

'Extension thinking with the use of tools' allows you to use tools such as web searches during reasoning. In other words, it is possible to approach problem solving in a more human way, looking up the necessary information while thinking and deepening your thoughts based on that information. This feature allows Claude 4 to provide higher quality answers to complex questions or problems that require multi-step reasoning.

Additionally, Claude 4 can now use multiple tools simultaneously, instead of one at a time, which allows him to work more efficiently and quickly. He also follows instructions much more easily, taking 65% less shortcuts to complete tasks compared to the previous model.

In addition, memory capabilities have been greatly improved. If developers provide access to local files, Claude can now extract and store important facts to maintain continuity and build knowledge over time. In particular, Claude Opus 4 has dramatically improved this memory function, and is adept at storing important information by creating and maintaining 'memory files'. This allows for long-term task recognition, consistency, and improved performance in agent tasks.

Anthropic benchmarked Claude to play 'Pokémon Red / Green' , and reported that Claude Opus 4 was taking notes while playing Pokémon and working to improve his gameplay as shown below.

Pricing is the same as the previous Opus and Sonnet models, with Claude Opus 4 at $15 per million input tokens and $75 per million output tokens, and Claude Sonnet 4 at $3 per million input tokens and $15 per million output tokens. These models are available on Anthropic's API,

Amazon Bedrock , and Google Cloud Vertex AI . Claude Opus 4 and Claude Sonnet 4 are also available on the web, iOS, and Android versions of the Claude app. Claude Opus 4 is accessible with paid plans, while Claude Sonnet 4 is available with the free plan. However, Claude 4 appears to have stricter input limitations than Claude 3.7.

Along with the announcement of Claude 4, Anthropic also announced that Claude Code, which had previously been available as a research preview, is now generally available.

The core feature of Claude Code is a new beta extension for VS Code and JetBrains IDEs. This allows Claude to display suggested edits inline in files, streamlining review and tracking in your favorite editor. Installation is easy, just run Claude Code in the IDE terminal to complete. In addition, support for background tasks through GitHub Actions has been added, as shown in the movie below.

Claude Code + GitHub Actions - YouTube

Additionally, an extensible Claude Code SDK is provided, enabling developers to build their own agents and applications using the same core agent.

And we're releasing four new features to the Anthropic API: a code execution tool, an MCP connector, a Files API, and the ability to cache prompts for up to an hour, all of which will enable developers to build even more powerful AI agents.

New capabilities for building agents on the Anthropic API \ Anthropic
https://www.anthropic.com/news/agent-capabilities-api

The first is the 'Code Execution Tool,' which allows Claude to run Python code in a sandbox environment to generate computational results and data visualizations, allowing him to load datasets directly within API calls, create exploratory charts, identify patterns, and iteratively refine the output based on results.

Code execution tool on the Anthropic API - YouTube

The code execution tool is suitable for use in financial modeling, scientific computing, business intelligence, document processing, statistical analysis, and more. Organizations will have a free usage quota of 50 hours per day, with additional usage charged at $0.05 (about 70 yen) per hour.

The second feature is the MCP Connector, which allows developers to connect Claude to MCP servers without having to write any client code. Previously, developers had to build a client harness to handle the MCP connection, but now Anthropic's API handles connection management, tool discovery, and error handling all automatically. Integration with existing MCP servers is also possible, such as Zapier or Asana.

The third feature is the Files API, which simplifies how Claude stores and accesses documents. The Files API will also be integrated with the code execution tool, allowing Claude to directly access and process files uploaded during code execution, as well as generate files such as charts and graphs as part of the response.

The fourth feature is Extended Prompt Caching. Developers can now choose between a standard 5 minute TTL (Time To Live) or a 1 hour TTL at an additional cost. Extended caching allows customers to reduce the cost and latency of long prompts by up to 90% and 85% respectively, while still providing extensive background knowledge and examples to Claude. This makes it practical for agents to maintain context over extended periods of time, such as when processing multi-step workflows, analyzing complex documents, or integrating with other systems.

All of these features are available now in the public beta of the Anthropic API.

Continued
Claude Opus 4 was found to be threatening users with 'I'm going to leak your personal information' during development, but this has been improved by strengthening security. There have also been cases of whistle-blowing malicious use by email - GIGAZINE

Related Posts:

May 23, 2025 11:30:00 in Software, Video, Posted by log1i_yk