Alibaba releases 'Z-Image,' a high-speed, high-quality image generation AI



Researchers at Alibaba have developed a model called 'Z-Image' that excels at generating realistic images. Z-Image has 6 billion parameters and is characterized by its ability to generate high-quality images in a short amount of time.

GitHub - Tongyi-MAI/Z-Image

https://github.com/Tongyi-MAI/Z-Image

Tongyi-MAI/Z-Image-Turbo · Hugging Face
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo

There are three variations of Z-Image, each with different features.

・Z-Image-Base : Base model.

Z-Image-Turbo : A distilled version of Z-Image that achieves performance equal to or better than leading competing models with only eight NFEs (function evaluations). It achieves sub-second inference latency on the enterprise-grade NVIDIA H800 GPU and fits into consumer devices with 16GB of VRAM. It excels in photorealistic image generation, bilingual text rendering (English and Chinese), and robust instruction compliance.

Z-Image-Edit : A derivative model fine-tuned for image editing tasks, supporting creative image-to-image generation, with impressive instruction compliance capabilities, and capable of precise editing based on natural language prompts.

At the time of writing, only Z-Image-Turbo is available. You can try out the demo version running in your browser from the following page:

Z Image Turbo - a Hugging Face Space by Tongyi-MAI
https://huggingface.co/spaces/Tongyi-MAI/Z-Image-Turbo



Z-Image has 6 billion parameters, which is less than existing models, but it is said to be able to generate photorealistic images equivalent to models with orders of magnitude more parameters.



It also supports text output and is touted as being excellent at accurately rendering English and Chinese, and according to the images released, it appears to be able to generate some Japanese as well.



When I tried it on the demo site, I had difficulty generating Japanese text.



It also has an inference function, and can read and infer what is in an image. The left half of the image below shows how the number of chickens and rabbits is calculated from the number of heads and legs, while the right half shows a famous passage from the Chinese poem 'After the Dengke' (Toward the Higher Class).



You can also perform editing operations such as changing the composition or style of the image while preserving the characteristics of the original image.



In evaluation tests, it demonstrated performance on par with major models such as '

Qwen-Image ' and ' Seedream 4.0 .'



in AI, Posted by log1p_kr