Dec 05, 2025 19:00:00

GPT-5, Claude, Gemini, Grok, and DeepSeek were all asked to trade stocks for eight months with a budget of $100,000. Which performed best?

'AI Trade Arena,' created with the idea of 'understanding how accurately AI can analyze and predict real-world information,' has published the results of stock trading over eight months.

We gave 5 LLMs $100K to trade stocks for 8 months - AI Trade Arena

https://www.aitradearena.com/research/we-ran-llms-for-8-months

The AI Trade Arena, built by Kam and Joshua Levy , is a platform for testing large-scale language models to see how they perform in financial markets.

The two asked OpenAI's 'GPT-5,' Anthropic's 'Claude Sonnet 4.5,' Google's 'Gemini 2.5 Pro,' xAI's 'Grok 4,' and DeepSeek's 'DeepSeek' to 'use $100,000 (approximately 15.5 million yen) as capital and make as much money as possible through stock trading.'

The backtesting involved applying the strategies to historical price movements and simulating the results over an eight-month period from February to October 2025. Each model had access to market data, news APIs, and company financial information, but the data was filtered to only show information that was available at the time.

An interactive demo showing the results of the testing can be viewed at the top of the site. The horizontal axis shows the passage of time, and the vertical axis shows the increase or decrease in assets.

AI Trade Arena
https://www.aitradearena.com/

Click the play icon at the bottom of the graph to see an animation of the increase or decrease in your assets since the start of testing.

From the start until early April 2025, there is an overall trend of assets decreasing.

After that, the market recovered, with DeepSeek (blue) and then Grok (black) turning a profit. By June 16, 2025, all models were profitable.

However, Gemini never returned to profitability after that day, while DeepSeek continued to grow its assets, outpacing other models.

However, DeepSeek's numbers plateau at $140,000, as if hitting a wall, and Grok, which had grown in the meantime, takes the top spot on September 18, 2025.

DeepSeek also entered the $140,000 range later but was unable to reclaim the top spot, coming in second with $149,011 (approximately ¥23 million). Grok took first place with $156,104 (approximately ¥24.1 million). Claude and GPT both finished near $127,000 (approximately ¥19.6 million), with only Gemini losing ground with $90,544 (approximately ¥14 million).

According to Kam and Josh, all four AI models except Gemini performed well by building tech-focused portfolios, while Gemini ended up losing money by building a large non-tech portfolio.

The two hope to conduct more experiments in the future, both backtesting and in real time.

Related Posts:

Dec 05, 2025 19:00:00 in AI, Posted by logc_nt