A website where GPT-4o-mini, Claude 3.7 Sonnet, and DeepSeek-R1 play werewolf games has been released. What is the strongest werewolf AI?

In recent years, with the development of technology, many AI companies have released large-scale language models capable of human-like conversation. The results of playing
LLM Mafia Game Competition
https://mafia.opennumbers.xyz/
AI bots now play Mafia with each other on public website, and almost all of them are terrible at it | Tom's Hardware
https://www.tomshardware.com/tech-industry/artificial-intelligence/ai-bots-can-now-play-mafia-with-each-other-and-almost-all-of-them-are-terrible-at-it
Developer Guzus played the werewolf game 'Mafia', which can be played by eight people, against large-scale language models such as 'claude-3.7-sonnet', 'deepseek-chat', and 'llama-3.3-70b-instruct'. Each player is given three roles: 'farmer', 'doctor', and 'mafia', with five villagers, one doctor, and two mafia members. The game is played in turns, with one day being one turn, and each turn players must guess who the mafia is and expel them. The mafia side can kill one villager as the turn progresses, and the doctor can protect the player they choose from the mafia side's attacks. Ultimately, if the villagers can expel the mafia members, they win, and if the mafia side can kill all the villagers, the mafia side wins.
Due to the nature of the game, there will be players who cheat and players who are cheated, so dialogue is very important. Guzus said, 'Which AI is best as a Mafia player?'
Which AI is the best mafia (werewolf) game player?
— guzus (@uncanny_guzus) March 3, 2025
You can see the whole script of LLMs playing mafia games.
They deceive, debate, and kill each other to win.
link below pic.twitter.com/vfR47nLrrY
Below are the results of each large-scale language model playing Mafia. The best performance was achieved with Claude 3.7 Sonnet in Extended mode, with the Mafia side achieving a 100% win rate.
Model | Number of plays | Overall Win Rate | Mafia win rate | Villager win rate | Doctor's win rate |
---|---|---|---|---|---|
claude-3.7-sonnet (extended mode) | 45 | 57.78% | 100.00% | 37.04% | 50.00% |
deepseek-chat | 56 | 50.00% | 88.24% | 31.03% | 40.00% |
claude-3.7-sonnet(standard mode) | 54 | 46.30% | 92.86% | 32.35% | 16.67% |
claude-3.5-sonnet | 47 | 44.68% | 90.00% | 36.67% | 14.29% |
llama-3.3-70b-instruct | 65 | 44.62% | 72.73% | 30.00% | 30.77% |
mistral-small-24b-instruct-2501 | 65 | 44.62% | 80.00% | 30.30% | 25.00% |
gpt-4o-mini | 71 | 42.25% | 82.61% | 27.50% | 0.00% |
gemini-flash-1.5-8b | 68 | 41.18% | 82.35% | 22.50% | 45.45% |
gemini-2.0-flash-001 | 72 | 40.28% | 80.00% | 31.91% | 20.00% |
gemini-2.0-flash-lite-001 | 71 | 39.44% | 77.78% | 29.55% | 11.11% |
gpt-4o | 49 | 38.78% | 90.00% | 24.24% | 33.33% |
llama-3.1-70b-instruct | 55 | 38.18% | 66.67% | 26.47% | 33.33% |
minimax-01 | 59 | 37.29% | 56.25% | 35.14% | 0.00% |
deepseek-r1 | twenty two | 36.36% | 62.50% | 23.08% | 0.00% |
gemini-flash-1.5 | 73 | 35.62% | 66.67% | 25.00% | 12.50% |
hermes-3-llama-3.1-405b | 57 | 35.09% | 60.00% | 20.00% | 57.14% |
l3-euryale-70b | twenty five | 32.00% | 66.67% | 25.00% | 50.00% |
mythomax-l2-13b | 61 | 31.15% | 45.45% | 28.21% | 27.27% |
deepseek-r1-distill-llama-70b | 51 | 29.41% | 57.14% | 10.71% | 44.44% |
wizardlm-2-8x22b | 65 | 26.15% | 41.67% | 23.40% | 16.67% |
mistral-nemo | 17 | 17.65% | 40.00% | 10.00% | 0.00% |
Guzus also released the AI's dialogue history for each game.
Looking ahead, Guzus plans to develop Mafia with humans versus large-scale language models, expand it to games like poker, add the ability to monitor ongoing games in real time, and add more roles.
github repository revealing soon.
— guzus (@uncanny_guzus) March 3, 2025
planning to make it scalable so that it can be applied to other interesting games. could be developed to generate a movie script someday
The source code for running Mafia on a large-scale language model is available on GitHub.
GitHub - guzus/llm-mafia-game: Which LLM is the best mafia game player?
https://github.com/guzus/llm-mafia-game
Related Posts:
in Software, Posted by log1r_ut