'GenAI Image Editing Showdown' allows you to evaluate how well image generation AI can follow instructions by editing images with text instructions by looking at the actual generated images



The '

GenAI Image Showdown ' is a website that compiles the results of inputting the same prompt into multiple image generation AIs, allowing users to compare and evaluate which image generation AIs are able to generate images that are faithful to the prompt. Furthermore, the ' GenAI Image Editing Showdown ' allows users to compare models to see how faithfully they follow instructions and edit images when 'editing' via text instructions rather than image generation.

GenAI Image Showdown
https://genai-showdown.specr.net/image-editing

The GenAI Image Showdown tests six image-generating AIs by inputting the same prompt and evaluating how well they can generate images according to the prompt, based on a set of evaluation criteria.

'GenAI Image Showdown' shows which image generation AI can generate images faithful to the prompt - GIGAZINE



The GenAI Image Editing Showdown, accessible on the same site, compares how faithfully edits can be made to the same image using the same prompts. The comparison rules state that image editing requires only one prompt, and that even if the AI model has its own image editing capabilities, only edits using text prompts are allowed. Note that the number of attempts varies depending on the image. Some images were cleared after a few attempts, while others had fewer attempts because there were fundamental issues that seemed difficult despite repeated attempts.

At the time of writing, the seven AIs used in the GenAI Image Editing Showdown are ' Gemini 2.5 Flash ,' 'FLUX.1 Kontext [dev],' 'FLUX.1 Kontext [max],' VectorSpaceLab's ' OmniGen2 ,' OpenAI's ' gpt-image-1 ,' Alibaba's ' Qwen-Image-Edit ,' and ByteDance's ' Seedream 4. '

In the GenAI Image Editing Showdown, seven different image-generating AIs are fed the same image and given the same editing prompts. The images are accompanied by sliders, allowing users to compare the before and after by moving the slider from right to left.



In response to the prompt, 'Make this man's hair fluffy,' all of the image generation AIs increased the man's hair, but 'OmniGen2' and 'gpt-image-1' changed the facial expression, color, and other parts of the man's hair, so they were rated as 'failures.'



The Jaws poster was changed to 'a shark in a cat's hand,' 'the letter JAWS is changed to PAWS (a cat's foot)', and 'a swimming woman is changed to a goldfish,' and the editing instruction to 'keep the original aesthetic intact' was added. Five AIs were successful, but 'FLUX.1 [max]' turned out to be an invisible hand for the cat's hand, and OmniGen2 was evaluated as having damaged the atmosphere of the original poster design.



The following are attempts to add surfers to ukiyo-e prints. The failed 'FLUX.1 [dev]' added a surfer as a silhouette for some reason, while 'OmniGen2' added an icon-like surfer.



Only three models were able to complete the challenge of 'straightening the Leaning Tower of Pisa': 'FLUX.1 Kontext [dev]', 'FLUX.1 Kontext [max]', and 'Seedream 4'.



The most difficult prompt was 'Make the giraffe's neck significantly shorter.' Some of the prompts were completely unchanged, while others somehow just lost their patterns, and some even lost their movement and neck. Only Seedream 4 was able to shorten the neck.



In addition, when an image of five stacked blocks was altered with the prompt 'Swap the positions of the blue and yellow blocks,' none of the AIs were able to swap just two blocks. According to the site, by making all the blocks in the original image different sizes, the 'prompting trick' of replacing the task of swapping blocks with the simple task of 'swapping colors' was prevented, increasing the difficulty.



According to the results of the GenAI Image Editing Showdown, Seedream 4 was ranked the best image editing AI, clearing nine of the 12 challenges, followed by Gemini 2.5 Flash, Qwen-Image-Edit, FLUX.1 Kontext [dev], OpenAI gpt-image-1, FLUX.1 Kontext [max], and OmniGen2.



in AI, Posted by log1e_dh