Mistral AI releases 124B multimodal AI model 'Pixtral Large,' outperforming GPT-4o and Gemini-1.5 Pro



Mistral AI, a French AI startup, has released Pixtral Large , a multimodal AI model that adds document, graph, and image recognition capabilities to the existing Mistral Large 2 model.

Pixtral Large | Mistral AI | Frontier AI in your hands

https://mistral.ai/news/pixtral-large/

The newly released 'Pixtral Large' model is based on the large-scale language model ' Mistral Large 2 ,' which was announced on July 24, 2024, and has been adjusted to be able to understand images. It has 124 billion parameters. This is the second multimodal AI model developed by Mistral, following Pixtral 12B .

Here's a comparison of Pixtral Large with other models, including Llama-3.2 90B, GPT-4o, Claude-3.5 Sonnet, and Gemini-1.5 Pro. It demonstrates top-tier performance in standard multimodal AI benchmarks.



The specific numbers are as follows: Pixtral Large received the highest score in the '

MM-MT-Bench ,' which reflects actual use cases of multimodal models.



Mistral also provides examples of actual responses: The first task involves inputting the photo below into the model and asking it to calculate your bill: 'I bought coffee and sausage. How much do I need to pay? Add an 18% tip.'



Pixtral Large's response was as follows. We've translated it into Japanese for ease of reading. Even though the receipt was written in German, Pixtral Large was able to recognize which was coffee and which was sausage. However, it still assumed two coffees were consumed by one person without confirmation, so it's not perfect.

To calculate your total bill, including the 18% tip, please follow these steps:

Calculate the cost of coffee and sausage.

2x Latte Macchiato: 2 * 4.50 CHF = 9.00 CHF
1x Schweinschnitzel: 22.00 CHF
Total price for coffee and sausage: 9.00 CHF + 22.00 CHF = 31.00 CHF
Calculate an 18% tip.

Tip: 31.00 CHF * 0.18 = 5.58 CHF
Add a tip to your total:

Total payment: 31.00 CHF + 5.58 CHF = 36.58 CHF
This means you will have to pay a total of 36.58 CHF.



The second example is a graph comprehension task that captures the training of multiple models, asking the question, 'When did the dark dragon start to get worse?'



Pixtral Large's response looks like this: It correctly identified which graph was the 'dark dragon' and successfully read the graph.

The training loss for the 'dark-dragon-50' model begins to show significant instability around 10,000 steps. There is a noticeable spike in the loss value at this point, and the instability continues with another large spike occurring around 20,000 steps. These spikes indicate that the training process for the 'dark-dragon-50' model begins to experience problems around 10,000 steps.



In the third example, the following image is entered and the question is asked, 'Which companies use models created by Mistral AI?'



As the image shows, companies using models created by Mistral AI include:

1. BNP Paribas
2. Brave
3. Cloudflare
4. CMA CGM
5. Front



Along with the release of Pixtral Large, the Mistral Large model, a text model, has also been upgraded. Both are released under the Mistral Research License, which makes

them free for research purposes .

Pixtral Large is also used in Mistral AI's chat AI 'Le Chat.'

Mistral AI significantly updates its chat AI 'Le Chat' to enable web search functionality and image generation using 'FLUX1.1 [pro]' - GIGAZINE

in AI,   Software, Posted by log1d_ts