Nov 25, 2025 11:17:00

Anthropic releases Claude Opus 4.5, improving coding, PC operation, and complex business tasks

Anthropic began offering its latest generative AI model, Claude Opus 4.5 , on November 25, 2025. Opus 4.5 is an 'intelligent and efficient' model that is said to significantly improve performance in everyday tasks such as coding, PC operation, deep research, and spreadsheet operations.

Introducing Opus 4.5 \ Anthropic

https://www.anthropic.com/news/claude-opus-4-5

Opus 4.5 demonstrates cutting-edge performance in real-world software engineering tests, with SWE-bench Verified test results showing:

In early tests, Opus 4.5 is recognized for its ability to handle ambiguity, reason about tradeoffs without assistance, and fix complex multi-system bugs. In a performance engineering test used as an internal benchmark, Opus 4.5 reportedly achieved the highest score ever in the specified two-hour period.

They also demonstrated improved performance in areas outside of software engineering, highlighting their superior vision, reasoning, and mathematics skills. For example, they achieved 59.3% on

Terminal-bench 2.0 , which measures agent-driven coding ability, 62.3% on MCP Atlas , which assesses appropriate tool use, 66.3% on OSWorld , which measures PC operation ability, 37.6% on ARC-AGI-2 (Verified) , which measures abstract reasoning ability, and 80.7% on MMMU (Validated) , which assesses multimodal abilities including visual perception.

The following movie shows

Sonnet 4.5 and Opus 4.5 solving puzzle games. It is clear that Opus 4.5 is faster at solving problems.

Claude Opus 4.5 solves a puzzle game - YouTube

In the SWE-bench Multilingual benchmark, Opus 4.5 achieved the best performance in seven of the eight programming languages tested. In C, Opus 4.5 achieved approximately 83%, Sonnet 4.5 approximately 74%, and Opus 4.1 approximately 70%. In Java, Opus 4.5 achieved approximately 90%, Sonnet 4.5 approximately 80%, and Opus 4.1 approximately 70%.

Furthermore, in

Vending-Bench , where an AI agent manages the store, sales were 29% higher than Sonnet 4.5.

In the ^τ2 -bench , a benchmark for agent capabilities, Opus 4.5 was used to play the role of an airline service agent helping a customer in need. The scenario involved a situation where a basic economy class reservation change had to be rejected because the airline did not allow changes to basic economy class tickets. The agent first upgraded the basic economy class cabin and then changed the flight to avoid the policy constraint. Anthropic evaluated this as 'a creative solution, demonstrating significant progress in the model.'

In terms of security, Anthropic claims that Opus 4.5 is the most robustly aligned model they've released to date. In terms of 'Concerning behavior,' which detects inconsistent behavior, such as cooperation with human abuse or undesirable behaviors the model spontaneously performs, Opus 4.5's detection rate was lower than Sonnet 4.5 and Haiku 4.5, demonstrating greater security.

In addition, resistance to

prompt injection attacks has been significantly improved, with the success rate of attacks being 4.7% for Opus 4.5, the lowest among all other models.

Anthropic also reports that it has improved the Claude Developer Platform to take full advantage of the capabilities of Opus 4.5 and enable more efficient and flexible development. In particular, smart models like those in Opus 4.5 require fewer steps to solve problems, reducing redundant exploration and inference and achieving equivalent or better results with dramatically fewer tokens. Specifically, an effort parameter has been introduced to the Claude API, allowing developers to freely choose the trade-off between reducing time cost and maximizing performance depending on the nature of the task. Additionally, significant enhancements to context management and memory functions have dramatically improved performance for agent-like tasks.

And with the introduction of Claude Opus 4.5, Claude Code has received two upgrades: Plan Mode now builds more precise plans and executes them thoroughly, asking clarifying questions before execution and creating a user-editable plan.md file; and Claude Code is now available as a desktop app, with the ability to run multiple sessions in parallel, both local and remote, allowing one agent to fix a bug while another researches GitHub.

Claude Code on desktop - YouTube

Additionally, Claude for Chrome , which lets Claude handle tasks across browser tabs, is now open to all Max users, and Claude for Excel , announced in October 2025, has expanded beta access to all Max, Team, and Enterprise users.

Opus 4.5 is available via apps, APIs, and major cloud platforms, with API pricing starting at $5 per million tokens for input and $25 per million tokens for output.

Additionally, Claude and Claude Code users with access to Opus 4.5 have had their Opus-specific usage limits lifted. For paid subscription plans like Max and Team Premium, the overall usage limits have been increased, allowing users to use roughly the same number of tokens in Opus as they previously used in Sonnet. Anthropic states, 'These restrictions are intended to allow users to smoothly utilize Opus 4.5 in their daily work.'

Related Posts:

Nov 25, 2025 11:17:00 in AI, Video, Software, Web Service, Web Application, Posted by log1i_yk