Alibaba announces an upgraded version of the AI model 'Qwen3-Omni-Flash' that can recognize multilingual voices and have real-time conversations



Qwen, the AI research team at Alibaba, a major Chinese technology company, has announced

Qwen3-Omni -Flash-2025-12-01, an upgraded version of Qwen3- Omni-Flash, an AI model capable of recognizing multilingual speech and conducting real-time conversations.

Qwen3-Omni-Flash-2025-12-01:Hear You. See You. Follow Smarter!
https://qwen.ai/blog?id=qwen3-omni-flash-20251201

The newly announced Qwen3-Omni-Flash-2025-12-01 is an upgraded version of the multimodal AI model 'Qwen3-Omni-Flash,' which processes text, images, audio, and video and enables real-time voice conversation. The features of Qwen3-Omni-Flash-2025-12-01 are as follows:

- Significantly enhanced audio-visual interaction
The ability to understand and execute audio and visual support has been dramatically improved, effectively resolving the common cognitive decline issues in everyday conversations. The stability and consistency of audio and visual conversations across multiple rounds of interaction has been significantly improved, enabling more natural and seamless interactions.

Enhanced system prompt control
Full customization of system prompts gives users precise control over model behavior, allowing them to fine-tune the AI model's personality, tone, and even the length of its output.



・Improved reliability of multilingual support
It supports 119 languages for text-based interactions, 19 languages for user-spoken speech understanding, and 10 languages for AI-generated synthetic speech output. It addresses the language tracking instability that existed in the previous version, ensuring accurate and consistent performance across diverse linguistic contexts.



・More human-like and fluent synthesized voice
Significantly enhanced adaptive prosody control eliminates awkward or robotic speech, while context-based adjustments to speech rate, pauses, and intonation result in expressive, natural-sounding speech output.

Qwen3-Omni-Flash-2025-12-01 also boasts improved logical reasoning and code generation capabilities, as well as improved performance for tasks based on video and image input. These upgrades are said to have resulted in an AI model that can listen to users' voices, see their behavior, and follow their actions more intelligently than ever before.

Comparing the benchmark scores for text, audio, speech generation, images, and videos between Qwen3-Omni-Flash-2025-12-01 and the previous Qwen3-Omni-Flash, we can see that Qwen3-Omni-Flash-2025-12-01 performs better in most areas.



You can see how real-time voice conversation can be performed with Qwen3-Omni-Flash-2025-12-01 in the YouTube video below.

Qwen3-Omni-Flash just got a massive upgrade (2025-12-01 version) ! - YouTube


On Qwen3-Omni-Flash-2025-12-01, the subject is shown a video of a game console, an electronic piano, and a guitar that his father bought for him, and then asked, 'Can you give me a short description of the second item in Chinese, French, and German?'



Qwen3-Omni-Flash-2025-12-01 then immediately provided an explanation about the digital piano in three languages. In this way, Qwen3-Omni-Flash-2025-12-01 is able to understand the context without being distracted by unnecessary information and respond in multiple languages.



Qwen3-Omni-Flash-2025-12-01 can also take on the role of a game master, just like a human, when playing games that require a game master, such as Werewolf.



During the game, even when players were asked to point to a specific man to kill, Qwen3-Omni-Flash-2025-12-01 was able to accurately identify the man who had been killed.



You can also show Qwen3-Omni-Flash-2025-12-01 the type of fruit and the price tag and have it calculate the total.



Also, when asked about the effects of fruit during the conversation, Qwen3-Omni-Flash-2025-12-01 answered without any confusion.



Furthermore, in a situation where a smartphone was ringing somewhere in the room but the user did not know where the device was, Qwen3-Omni-Flash-2025-12-01 relied on sound and video to locate the smartphone.




in AI,   Video, Posted by log1h_ik