Aug 08, 2025 12:35:00

The chart released when GPT-5 was released was so crazy that it was nicknamed 'VIBECHART' after AI vibe coding, causing controversy.

OpenAI announced its flagship AI model,

GPT-5 , on August 8, 2025 (Japan time). When a new AI model is announced, it's customary to highlight the AI's performance by visualizing its benchmark results in graphs. However, the graphs used in the GPT-5 announcement were found to be clearly inconsistent with the actual numbers, sparking widespread criticism.

VIBECHART.NET
https://www.vibechart.net/

For example, the graph below shows the benchmark results of SWE-bench for GPT-5, o3, and GPT-4o, which was published on the OpenAI release page at the time of writing.

And here's a graph of the benchmark results released shortly after the announcement: GPT-5's standard model (light pink) achieved 52.8% and its inference model (dark pink) achieved 74.9%, while o3 achieved 69.1% and GPT-4o achieved 30.8%. For some reason, the latter two results are the same height in the bar graph, and even though the standard model's results were lower than o3's, the standard model's graph is somehow higher.

GPT-5

The marketing: 'It's like having a team of PhDs in your pocket!'

Also the marketing: This y-axis????‍♂️❓

#DataViz #ChatGPT

[image or embed]
— Tyler Morgan-Wall ( @tylermw.com ) August 8, 2025 2:12

The graph below compares how GPT-5 and o3 respond to impossible tasks. The graph shows the 'deception rate'—the rate at which the model states something that isn't true—so the lower the graph, the better. The correct graph is this.

And here's the graph that was released immediately after the announcement, which was also published in OpenAI's GPT-5 announcement stream. The obvious problem is the leftmost 'Coding deception' graph, where GPT-5 is showing a lower rate than o3's 47.4%, yet the displayed number is '50.0%.'

On the social news site Hacker News, many people posted comments such as, ' Why are they so sloppy? Because they want to spread the word with a funny, clumsy graph? Surely an AI can do that, even converting test results from an Excel document into a visual graph, ' ' OpenAI definitely should have had ChatGPT review their slides, ' and ' Maybe this was just sloppy, not intentionally misleading, but it's still a bad look for a company that's wasting billions of dollars and promising to revolutionize everything humankind does, when they can't even put together a decent PowerPoint .'

Related Posts:

Aug 08, 2025 12:35:00 in Software, Posted by log1i_yk