Baidu has released 'ERNIE-Image,' an AI image generator that can produce both illustrations and realistic images. It can run locally and also has a feature that automatically appends to user input to create high-quality prompts.

Chinese company Baidu released its AI image generation models, ' ERNIE-Image ' and ' ERNIE-Image-Turbo, ' on April 15, 2026. Both models are publicly available for download and can generate high-quality illustrations and realistic-looking images.
Introducing ERNIE-Image
GitHub - baidu/ERNIE-Image
https://github.com/baidu/ernie-image
ERNIE-Image is a DiT model with 8 billion parameters that can generate high-quality images based on text prompts. ERNIE-Image-Turbo is a model that applies reinforcement learning to ERNIE-Image, reducing the number of generation steps from 50 to 8.
Examples of ERNIE-Image's work are shown below. In addition to realistic and illustrative images, it can also generate comics and posters.

ERNIE-Image excels at drawing text and is also advertised as being able to control multiple objects as instructed. Below is an example of ERNIE-Image's work, which contains eight stamps with Japanese phrases in a single image. However, there are several errors, such as the prompt saying 'I'm sorry' becoming 'Gonnnne' and 'I'm looking forward to it!' changing to 'See you next time!'.

Click to expand the prompt
A collection of LINE facial expressions (Sticker sheet) designed with an animated cartoon style. Square width, 4 rows and 2 columns, 8 pieces of independent facial expressions. The background is dry and pure white. Each paper has a Japanese character in the middle of the paper, and the outline of the white color and the light gray projection, giving rise to a typical digit paper effect.
The main character on the screen is the same person: a Japanese two-dimensional style cute year old girl, a brown shortwave wave. Head, large eyes with amber color, pale yellow coat, white inside, dark red details.
8 Please see below for details on the actual purchase price:
1. Upper left corner of the paper: The female student is wearing a smiling face, and the right hand is open and the player is invited to play. On the left side of the figure is a yellow square with a rough Japanese character 'Good morning'.
2. Top right corner of the paper: A woman's eyes are closed, her eyes are closed, her legs are covered in red, her hands are in front of her chest, and her expression is full of emotion. The character's lower part is a powder colored square with a rough Japanese character 'Thank you', and the floating color of the Zhouyuan is a small love heart.
3. The second line of the paper on the left side: The female student wears a warm and gentle smile, and a cup of hot green tea appears in front of her. Above the figure is a green colored square with a coarse Japanese text 'Thank you for your hard work'.
4. The second line of the right side of the paper: The girl's skin is closed, the left eye is wink, and the right arm is stretched out in front of the right arm. The character 'OK!' is a rough English text drawn by a person.
5. 3rd row left side paper: The woman's eyebrows are slightly trampled, eight-character eyebrows are put together, ten hands are in front of her head, her head is slightly lower than her head, and a drop of blue sweat beads is placed on her face. On the right side of the person is a purple square with a rough Japanese character 'I'm sorry'.
6. 3rd line right hand side of the paper: The girl's eyes are open and the light is shining, both hands are clenched in fists, the body is small, and the expectations are full. Below the figure, there is an orange color with a rough Japanese script depicting 'Greetings'.
7. Bottom left corner of the paper: A woman's eyes are closed, her head is tilted to the side, her hands are hugging her head, and her head is covered in white. 'Good night' written in Japanese characters with a deep blue color on top of the person.
8. Lower right corner paper: female school opening general double arms high high rise, beak opening big smile, Zhou Yan with yellow light shining special effect. Figures below with red color and rough Japanese characters 'Enjoy! '.
The style of chiropractic painting is bright, gentle, full of vitality, and the color style of the Japanese-born Japanese style is perfect, the color palette is suitable, and the lines are clear and clear.
Generally, image generation AIs tend to produce higher quality images the more detailed the prompt you write, but writing long prompts takes time, so many users input short sentences to generate images. To solve this problem, ERNIE-Image incorporates a 'prompt enhancer' that adds to short prompts to generate longer ones. The images below, from left to right, are: 'a manga generated with a short prompt,' 'a manga generated by converting a short prompt to a long prompt using the prompt enhancer,' and 'a manga generated by converting a short prompt to a long prompt using Gemini 3.1 Pro Preview.' You can see that the output is of higher quality when converted to a long prompt.

The converted long-text prompt looks like this. The prompt enhancer is a 3-billion-parameter language model based on

ERNIE-Image and ERNIE-Image-Turbo have consistently achieved high scores in various benchmark tests. In a test comparing image generation performance from English prompts using OneIG-Bench , they outperformed Z-Image and GPT Image 1 [High].

A demo app is available that makes it easy to try out ERNIE-Image-Turbo, so I actually generated an image using it. First, click the link below to access the demo app.
ERNIE Image - a Hugging Face Space by baidu

Enter the prompt 'A photorealistic scene of a twin-tailed maid sitting on the ground in a narrow back alley, casually reading a newspaper. The newspaper headline clearly reads 'GIGAZINE' in bold letters.' and click 'Generate'.

The image as instructed was displayed on the right side of the screen. A long prompt, added by the prompt enhancer, is displayed in the lower right corner.

The output image is shown below. It's of very high quality.

The prompt was, 'A Japanese-style illustration of a twin-tailed maid sitting on the ground in a narrow back alley, casually reading a newspaper. The newspaper headline clearly reads 'GIGAZINE' in bold letters.' This is what it would look like. High-quality illustrations can also be generated.

The following is the result of typing in Japanese: 'An illustration of a twin-tailed maid sitting in an alley reading a newspaper. The newspaper has the headline 'GIGAZINE' written on it.' Instead of an alley, it showed a park bench. Japanese is supported to some extent, but the prompt following ability seems to be reduced.

ERNIE-Image and ERNIE-Image-Turbo are available for free at the following link. 24GB of VRAM is required to run them. The license is the Apache License 2.0.
baidu/ERNIE-Image · Hugging Face
https://huggingface.co/baidu/ERNIE-Image
baidu/ERNIE-Image-Turbo · Hugging Face
https://huggingface.co/baidu/ERNIE-Image-Turbo
Furthermore, ComfyUI already supports running ERNIE-Image, allowing you to download the model and generate it locally.
ERNIE-Image is now available in ComfyUI
— ComfyUI (@ComfyUI) April 15, 2026
An open-source 8B DiT text-to-image model from @ErnieforDevs , licensed under Apache-2.0.
Key highlights:
- Open-source under Apache-2.0 license
- Precise multilingual text rendering (EN, ZH, and more)
- Complex instruction following — multi-object,… pic.twitter.com/CcVvpSZqXs
Related Posts:







