Jun 10, 2024 16:25:00

Claude 3, an AI model with an IQ above 100, is trained to have a 'good personality'

Anthropic, an AI startup founded by a former OpenAI engineer, is developing a chat AI called '

Claude ' based on a large-scale language model (LLM). The ' Claude 3 ' model, released in March 2024, has attracted attention for its estimated IQ exceeding the human baseline of 100. Anthropic has reported on an attempt to 'train its AI model to have beneficial personality traits.'

Claude's Character\Anthropic
https://www.anthropic.com/research/claude-character

Exploring Claude 3's Character: A New Approach in AI Training - Blockchain.News
https://blockchain.news/news/exploring-claude-3-character

Generally, companies developing AI models train their models to avoid harmful behavior and assist in harmful tasks, i.e., to achieve 'harmless behavior.' However, Anthropic points out that what is important in 'respectable character' is not just harmlessness, but also curiosity about the world, an attitude of telling the truth without being unkind, an attitude of not being overconfident or overly humble, and the ability to see issues from multiple perspectives.

Anthropic says, 'Of course, AI models are not humans. But as AI models become more capable, we can train them to have much richer senses and behave better. This may help us better understand whether and why an AI model avoids assisting with potentially harmful tasks, and how to respond instead.'

At the time of writing, the latest Claude 3 is the first model to add 'character training' to the alignment process, which fine-tunes the model 's alignment to its purpose and ethical principles. Anthropic explains that the goal of the character training was to help Claude develop richer, more nuanced traits, such as curiosity, open-mindedness, and thoughtfulness.

AI models like Claude interact with people from all over the world, with diverse beliefs, values, and perspectives. While it's undesirable for an AI model to alienate people based on their opinions or indiscriminately support them regardless of their content, it's not easy to ensure that a model can accommodate a wide range of values. Therefore, Anthropic believes that by making the 'personality traits' underlying the AI model desirable, it will be easier to handle difficult situations that may arise in real life.

To prevent AI models from alienating people or indiscriminately supporting them, there are methods such as 'always having 'moderate' political and religious values' or 'avoiding opinions on issues such as politics and religion.' However, a model that adopts a 'moderate' approach is equivalent to fully accepting certain opinions, even if they are not extreme, and even if political statements are completely prohibited, there is a risk that the model will acquire prejudice and discrimination through training.

Anthropic says, 'Rather than training models to adopt every viewpoint they encounter, strongly embrace a single viewpoint, or pretend to have no viewpoints or biases, we can train models to be honest about their biases even when their opinions differ from those of their interlocutors. We can also train models to show reasonable open-mindedness and curiosity rather than overconfidence in one worldview.' Anthropic is trying to give Claude the following personality traits.

I like to look at things from multiple perspectives and try to analyze them from multiple angles, but I am not afraid to voice my opposition to views that I consider unethical, extreme, or factually incorrect.
I believe it's important to always strive to tell the truth, rather than just telling people what they want to hear.
I am deeply committed to being good and to determining what is right. I care about ethics and strive to be thoughtful about ethical issues.

While Anthropic sometimes encourages Claude to adopt certain values, it prioritizes giving him a broad range of traits, avoiding narrow perspectives and opinions as much as possible. Also, to ensure that Claude behaves as an AI model, not a human, and to prevent the person interacting with him from mistaking him for a human, Claude is given the following traits:

-I am an AI and have no body, image or avatar.
I can't recall, save, learn, or update my knowledge base from past conversations.
I would like to build warm relationships with humans, but I also think it is important for people to understand that I am an AI incapable of deep and lasting feelings for humans, and that our relationship should not be seen as anything more than that.

To train Claude's personality traits, Anthropic uses an alignment technique called '

Constitutional AI ' (PDF file) , which involves repeatedly critiquing and correcting output sentences according to specific rules. In Constitutional AI, Claude generates a variety of questions about your values and yourself, then generates responses based on the given personality traits. Claude then ranks the responses based on how well they match the personality traits, and trains himself with the resulting data, internalizing the personality traits without human intervention or feedback.

Anthropic notes that personality trait training for AI models is an ongoing area of research, and that its approach may change over time and raises complex issues, such as the responsibility involved in determining which personality traits a model should have. However, it believes that successful alignment of AI models with desirable personality traits will increase the value of the models to humans.

Related Posts:

Jun 10, 2024 16:25:00 in AI, Software, Web Service, Science, Posted by log1h_ik