May 01, 2026 11:03:00

OpenAI explains why ChatGPT and Codex started repeatedly using the word 'goblin,' citing a problem with its method of learning nerdy speech patterns.

The source code for OpenAI's agent tool, 'Codex CLI,'

includes an instruction to the AI that says, 'Do not talk about goblins or raccoons.' Users had indeed reported an increase in the frequency with which ChatGPT and Codex mentioned goblins, and on April 29, 2026, OpenAI reported the cause of the problem.

Where the goblins came from | OpenAI
https://openai.com/index/where-the-goblins-came-from/

The following screenshot, posted by Barron Roth , who works at Google on the development of AI products, shows the results of searching the conversation history with GPT-5.5 using the phrase 'goblin.' It shows that the phrase 'goblin' is used in a variety of contexts, even in topics that are not very related to goblins.

In connection with the above situation, it has also been revealed that OpenAI has added a directive to its agent tool's Codex CLI stating, 'Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless they are absolutely and obviously related to the user's query.'

OpenAI has instructed its AI coding tool, Codex, to 'not talk about goblins or raccoons' - GIGAZINE

According to OpenAI, the increasing trend of references to fictional creatures such as 'goblins' and 'gremlins' began to appear in GPT-5.1. The percentage of people who mentioned goblins in conversation was 0.04% in GPT-5 Thinking, but this jumped to 0.12% in GPT-5.1 Thinking.

Subsequently, the frequency of mentions of goblins in GPT-5.4 increased dramatically, and users also

pointed this out. In response, OpenAI conducted a detailed analysis and found that mentions of goblins increased when ChatGPT's personality was set to 'Nerdy (nerdy).' The following shows the goblin mention rate for each personality in GPT-5.4, and it is clear that Nerdy stands out. According to OpenAI, Nerdy accounted for 2.5% of all replies in ChatGPT, but 66.7% of replies that included goblins were from Nerdy.

ChatGPT's personality is formed by reinforcement learning using the rule 'reward when a certain type of reply is made.' Analysis revealed that it tends to prefer phrases like 'goblin' and 'gremlin' as reward signals for 'nerd-like replies.' This explains why Nerdy's references to goblins and similar terms increased.

Furthermore, it was found that Nerdy's learning outcomes also influenced other personality traits. The graph below shows, on the left, the 'percentage of references to 'goblins' and 'gremlins' in reinforcement learning including Nerdy,' and on the right, the 'percentage of references to 'goblins' and 'gremlins' in reinforcement learning excluding Nerdy.' It can be seen that including Nerdy in the training increases references to goblins and gremlins in the AI model as a whole.

OpenAI removed Nerdy in mid-March 2026. The graph below shows the percentage of mentions of goblins and gremlins over time, and it can be seen that the percentage of mentions decreased in GPT-5.4 after Nerdy was removed. However, since training for GPT-5.5 began before the root cause of the goblin problem was identified, the initial version of GPT-5.5 was affected by the problem. Therefore, OpenAI added a directive to the Codex CLI directive to prohibit goblins in order to mitigate the problem.

OpenAI commented that the Goblin problem helped them develop a method for quickly investigating strange behavior in models.

Related Posts:

May 01, 2026 11:03:00 in AI, Posted by log1o_hf