'Mistral Small 3', a latency-sensitive AI model capable of high-speed inference, is released

Mistral AI, a French AI startup, has released a latency-focused AI model called 'Mistral Small 3' under an open source license. It is possible that even more powerful models can be created using reinforcement learning and other techniques based on the newly released Mistral Small 3.
Mistral Small 3 | Mistral AI | Frontier AI in your hands
https://mistral.ai/news/mistral-small-3/
Below is a diagram showing 'Mistral Small 3', 'GPT-4o Mini', 'Gemma-2 27B', and 'Qwen-2.5 32B' lined up with the vertical axis as performance and the horizontal axis as latency. Mistral Small 3 is in the upper left of the figure, and you can see that it can generate high-quality answers quickly. By reducing the number of layers, Mistral Small 3 is able to generate 150 tokens per second using four NVIDIA H100 GPUs.

The results of humans rating Mistral Small 3 and other models on 'Which answer is more preferable?' are as follows. In addition to being rated higher than 'Gemma-2 27B' and 'Qwen-2.5 32B', it also showed performance that could compete with larger models such as Llama-3.3 70B, and was rated on par with GPT-4o mini.

The results of the large-scale multitask language understanding (MMLU) benchmark and the GPQA benchmark, which evaluates advanced inference capabilities, are shown in the figure below. The benchmark results also show that it has performance equivalent to Llama-3.3 70B and GPT-4o mini.

Similar results were seen in other benchmarks such as coding, mathematics, general knowledge and instruction.

Here's a comparison of the pre-tuned model before fine tuning, which performs comparable to larger models like the Llama 3.1 70B.

Its foreign language capabilities are also second to none.

By quantizing Mistral Small 3, it will be possible to run it on a single RTX 4090. It is also said to be effective in tasks that require quick response.
Mistral Small 3 can be downloaded from platforms such as
Hugging Face and Ollama . It is distributed under the Apache 2.0 license, so commercial use is also OK. In the future, small and large models with further enhanced inference capabilities are scheduled to appear.
Mistral is the only European company with a strong presence in large-scale language models, a general-purpose AI system, but it is expected to have only a 5% market share in 2024, and is being overshadowed by American companies. The Financial Times reports that Mistral's ability to bounce back from this point is crucial for Europe to maintain its influence in AI.
Related Posts:
in Software, Web Service, Posted by log1d_ts