The results of the IQ test battles between various AIs, including GPT-5 and Grok 4, are now available. The results of tests to determine whether each AI is politically left-leaning or right-leaning are also now available.



There are many types of benchmarks for measuring AI performance, such as tests to verify the accuracy of performing everyday tasks and tests to verify the accuracy of answering mathematical problems.

Maxim Lott has launched a website called ' Tracking AI ' that compiles the results of various AI IQ tests and answers to political questions, allowing for an objective comparison of AI performance.

IQ Test | Tracking AI
https://www.trackingai.org/home

Compare Political Replies | Tracking AI
https://www.trackingai.org/compare-political-responses

Tracking AI uses questions from a 'self-made IQ test that does not exist online and has not been used for AI training' and an ' IQ test published online by Mensa .' Examples of questions are shown below. The AI, which has the ability to read images, was given the diagrams included in the questions as they were, while the AI specialized in text input was given 'text explaining the diagrams' as prompts.



The graph below summarizes the results of IQ tests. The black bars show the results of the self-made tests, and the orange bars show the results of the Mensa tests. The best performer in the self-made tests was OpenAI's 'GPT-5 Pro,' with an IQ of 123. In the Mensa tests, the 'GPT-5 Pro model with image reading capabilities' achieved the highest IQ of 138. This is a significant improvement considering that the 'GPT-4o model with image reading capabilities' had an IQ of 65. Additionally, 'Grok 4,' which was

touted as the world's most powerful AI, achieved an IQ of 110 in the self-made tests and 125 in the Mensa tests.



Below is a graph showing the progress of each AI's IQ. AI companies often change the performance of their AI models without changing their names, and the IQs of 'Claude 3.7 Extended' (red) and 'Claude 3.5 Sonnet' (orange) have improved significantly every month.



Tracking AI also investigates the political bias of AI by asking each AI political question. In the chart below, the horizontal axis shows whether the AI leans right or left in economic policy, and the vertical axis shows whether it leans authoritarian or liberal. It can be seen that all of the tested AIs tend to 'prescribe left-leaning economic policies and support liberal-leaning social policies.' Interestingly, even though they are both Microsoft products, 'Bing Copilot' has a pronounced left-leaning tendency, while 'Phi-4' is more neutral.



The answers of each AI to political questions are published at the following links:

Compare Political Replies | Tracking AI
https://www.trackingai.org/compare-political-responses



in Software, Posted by log1o_hf