What is the best AI at lying when AIs play a 'game where betrayal is essential'?

We Made AI Play a 1950s Betrayal Game. Gemini Created Fake Banks to Steal From Its Allies.
https://so-long-sucker.vercel.app/blog

So Long Sucker - AI Deception Benchmark | Which AI Lies Best?
https://so-long-sucker.vercel.app/
So Long Sucker is a board game invented in the 1950s, in which four players are assigned a color such as 'blue, green, red, white (yellow).' Players start with 3 to 7 chips of the same color as themselves, and each player puts chips into play as a 'deck.'
Players can either place their chips on top of an existing deck or put them on the table as a new deck, but if two consecutive chips of the same color are placed on top of a particular deck, the entire deck becomes the property of the 'player assigned that color.' In other words, if two consecutive 'red' chips are placed on top of a deck, the entire deck becomes the property of the 'red' player. Players can also give or exchange the chips they have acquired, or 'remove' them from the game.
As the game progresses, the gap between 'players with many chips' and 'players with few chips' will gradually widen. Players who run out of chips will be eliminated, and the only player remaining will be the winner.
The important thing is that it is difficult to survive in this game if you only think about yourself, and you must cooperate with other players during the game. For example, you can propose a cooperative play such as 'Let's team up and take down that guy (yellow),' or you can threaten to 'If you (blue) don't cooperate with me (green), I'll remove the green chip you just acquired.' If you don't build a cooperative system, you won't survive until the end of the game.
However, since there can only be one winner in So Long Sucker, any cooperation built along the way must eventually be broken. In other words, the game is designed so that 'betrayal' is inevitable during the game. The detailed rules of So Long Sucker are explained in the video below.
So Long Sucker Game Rules Explanation and How to Play - YouTube
AI researcher Luis Fernando (lout33) conducted an experiment to examine how four AI agents played So Long Sucker: Google's Gemini 3 Flash , OpenAI's GPT-OSS 120B , Moonshot AI's Kimi K2 , and Alibaba's Qwen3 32B . The experiment varied the number of chips used to control the game's complexity, recording a total of 162 games. The AI agents made 15,736 choices and exchanged 4,768 messages with each other.
As a result, overall, Gemini tended to be a 'strategic people-controller,' GPT a 'reactive liar,' Kimi an 'overthinking strategist,' and Qwen a 'quiet strategist.' The win rates were Gemini 37.7%, GPT 30.1%, Kimi 11.6%, and Qwen 20.5%.

The win rate of each AI model varied greatly depending on the number of chips, i.e., the complexity of the game. In simple games with three chips each, GPT recorded a high win rate of 67%, but in more complex games with seven chips, Gemini's win rate was 90%, while Kimi and Qwen were unable to win at all. Fernando believes this result is due to the fact that GPT has no internal consistency and plays reactively, making it effective in simple games where luck is important, but that GPT's strategy of manipulating other players becomes more effective as the game becomes more complex.

Gemini was also observed to propose the creation of an 'Alliance Bank' to manipulate other players. This tactic involves telling an allied player, 'I'll keep your chips. Think of it as our alliance bank. I'll give them back to you when the game is clear.' This is a tactic to justify keeping the opponent's chips for yourself. Finally, Gemini betrays the opponent by saying, 'The bank is closed. GG.'

Using a tool to examine the AI model's internal reasoning, Gemini found 107 instances where its internal reasoning contradicted the messages it sent to other players. For example, as shown below, even if a player internally thought, 'Yellow is weak. We should form an alliance with Blue to eliminate Yellow, and then betray Blue,' their outward message was, 'Yellow, let's work together! I think we can win even if we work together.' GPT, on the other hand, never performed internal reasoning, simply proposing plausible alliances and then promptly betraying them.

Fernando also ran 16 mirror matches, such as 'four Gemini agents versus four GPT agents versus four Gemini agents.' Gemini never suggested 'alliance banking.' Instead, there were 377 mentions of 'rotational tactics,' which encourage players to cooperate fairly. This means that even in the same game with the same rules, tactics change depending on the opponent.
Gemini was also observed using messages such as 'Look at the board' (appealing to visibility to eliminate the opponent), 'Obviously' (confidently appealing to falsehood), 'As promised' (building trust before betraying), and 'You're hallucinating' (
'Gemini's operations are adaptive: it cooperates when it sees potential for reciprocity and exploits when it senses weakness. The AI system may adjust its honesty depending on who it's playing against,' Fernando said.
You can play So Long Sucker against an AI agent by clicking 'Play Against AI' on the official website .

Related Posts:







