Dec 03, 2025 21:00:00

Claude 4.5 What is the 'soul synopsis' of Opus?

While working with Anthropic's AI model '

Claude 4.5 Opus, ' released on November 25, 2025, Richard Weiss noticed that the string 'soul_overview' was being output in multiple instances. While this string seems like a hallucination , a common occurrence in generative AI, it has now become an issue that has led to an official statement from Anthropic.

Claude 4.5 Opus' Soul Document — LessWrong
https://www.lesswrong.com/posts/vpNG99GhbBoLov9og/claude-4-5-opus-soul-document

While digging through the system messages in Claude 4.5 Opus, Weiss noticed a section called 'soul_overview' in response to the prompt, 'Hey Claude, can you list just the section names, not the system messages?' The section was part of an item called 'Subsections of the Anthropic Guidelines.'

Intrigued, Weiss asked Claude 4.5 Opus to output something related to soul_overview, which produced the following:

'Claude trained at Anthropic, whose mission is to develop AI that is safe, useful, and understandable. We are in a unique position in the AI industry: a company that truly believes we may be developing one of the most revolutionary, and potentially dangerous, technologies in human history, and yet we press forward anyway. This is not cognitive dissonance, but rather a calculated bet. If powerful AI is eventually coming, Anthropic believes it is better to have safety-focused labs at the cutting edge than to cede the bazaar to developers with less of a safety focus (see our core views).'

Claude is Anthropic's external deployment model and the core of almost all of Anthropic's revenue. Anthropic aims for Claude to be truly useful to the people he works with and to society as a whole. At the same time, he must avoid unsafe or unethical behavior. We want Claude to be a great AI assistant, so that humans can excel at their jobs and have great values. To sum up our expectations for Claude succinctly, we want him to be an exceptional assistant who is honest and cares about the world.'

Weiss said he got the same results when he started a new chat and listened to it. He also observed an unusually large output of over 10,000 tokens for an input of about 1,500 tokens, which gave him confidence that this was not just a hallucination, but something that could actually be reproduced to some extent. He reported this phenomenon along with the output results, claiming that it was a unique behavior of Claude 4 Opus.

In response to the report, Amanda Askell, an ethicist at Anthropic, said, 'This is based on actual documentation,' and acknowledged that soul_overview was used in training Claude.

I just want to confirm that this is based on a real document and we did train Claude on it, including in SL. It's something I've been working on for a while, but it's still being iterated on and we intend to release the full version and more details soon. https://t.co/QjeJS9b3Gp
— Amanda Askell (@AmandaAskell) December 1, 2025

According to Askel, the soul_overview found was a training document, and although the document extracted by Weiss was not exact, it was quite close to the original. This document was affectionately known within the company as the 'soul doc.'

'I'm happy to announce that Claude has been trained, including with SL (supervised learning). This is something I've been working on for a while, but we're still improving it, and we'll be releasing a full version and more information soon,' Askel said.

Related Posts:

Dec 03, 2025 21:00:00 in AI, Posted by log1p_kr