The results of the IQ test battles between various AIs, including GPT-5 and Grok 4, are now available. The results of tests to determine whether each AI is politically left-leaning or right-leaning are also now available.

There are many types of benchmarks for measuring AI performance, such as tests to verify the accuracy of performing everyday tasks and tests to verify the accuracy of answering mathematical problems.
IQ Test | Tracking AI
https://www.trackingai.org/home
Compare Political Replies | Tracking AI
https://www.trackingai.org/compare-political-responses
Tracking AI uses questions from a 'self-made IQ test that does not exist online and has not been used for AI training' and an ' IQ test published online by Mensa .' Examples of questions are shown below. The AI, which has the ability to read images, was given the diagrams included in the questions as they were, while the AI specialized in text input was given 'text explaining the diagrams' as prompts.

The graph below summarizes the results of IQ tests. The black bars show the results of the self-made tests, and the orange bars show the results of the Mensa tests. The best performer in the self-made tests was OpenAI's 'GPT-5 Pro,' with an IQ of 123. In the Mensa tests, the 'GPT-5 Pro model with image reading capabilities' achieved the highest IQ of 138. This is a significant improvement considering that the 'GPT-4o model with image reading capabilities' had an IQ of 65. Additionally, 'Grok 4,' which was

Below is a graph showing the progress of each AI's IQ. AI companies often change the performance of their AI models without changing their names, and the IQs of 'Claude 3.7 Extended' (red) and 'Claude 3.5 Sonnet' (orange) have improved significantly every month.

Tracking AI also investigates the political bias of AI by asking each AI political question. In the chart below, the horizontal axis shows whether the AI leans right or left in economic policy, and the vertical axis shows whether it leans authoritarian or liberal. It can be seen that all of the tested AIs tend to 'prescribe left-leaning economic policies and support liberal-leaning social policies.' Interestingly, even though they are both Microsoft products, 'Bing Copilot' has a pronounced left-leaning tendency, while 'Phi-4' is more neutral.

The answers of each AI to political questions are published at the following links:
Compare Political Replies | Tracking AI
https://www.trackingai.org/compare-political-responses

Related Posts:
in Software, Posted by log1o_hf