May 27, 2026 11:25:00

NVIDIA has developed a new technology called 'PiD' that directly converts the latent representations of image generation AI into high-resolution images.

NVIDIA's research team has announced ' PiD (Pixel Diffusion Decoder) ,' a technology that directly converts vector latent representations into high-resolution images. PiD aims to replace the conventional cascaded process of decoding at low resolution and then super-resolution, achieving both low latency and high visual quality.

PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion

https://research.nvidia.com/labs/sil/projects/pid/

[2605.23902] PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion
https://arxiv.org/abs/2605.23902

The dominant algorithm for current image generation AI is the 'diffusion model.' The basic mechanism of image generation using the diffusion model can be better understood by reading the following article.

A detailed diagram illustrating how the image generation AI 'Stable Diffusion' generates images from text - GIGAZINE

High-resolution text image generation widely employs a method that generates images in a compact 'latent space,' that is, a numerical space where data features are mathematically compressed, and then converts them into high-resolution images using a decoder. However, conventional decoders are optimized for restoring the encoder output, and have limitations in their ability to synthesize new details and achieve efficiency at the megapixel level.

PiD redefines latent decoding as conditional pixel spread and integrates decoding and upsampling into a single generation module.

With latent representations providing the overall structure and meaning, and a pixel diffusion model directly synthesizing high-resolution details, PiD adds a lightweight ControlNet-like adapter to a pixel-space diffusion model based on

PixelDiT .

This ControlNet-like adapter injects a noisy latent representation into the model and uses sigma-aware gates to adjust how much confidence is placed in the latent representation based on the amount of noise. This method allows PiD to generate 4x or 8x upscaled images with low latency.

NVIDIA's research team reports that they were able to convert a latent representation of a 512x512 image to 2048x2048 pixels in less than a second on a consumer-grade RTX 5090 with a peak memory usage of 13GB. They also stated that the same process could be completed in as little as 210ms on a GB200 GPU, which is approximately six times faster than diffusion-based super-resolution cascaded pipelines, and the NVIDIA research team also praised the high visual fidelity.

Furthermore, PiD can handle not only fully denoised latent representations but also intermediate latent representations. This makes it possible to terminate the inference of the underlying latent diffusion model midway and convert the remainder into a high-resolution image on the PiD side.

In addition, the inference process is reduced to four steps through distillation using DMD2 . This also reduces the need to run unconditional inference separately, aiming to simplify the entire high-resolution image processing process.

The supported latent representations are not limited to conventional VAEs; they can also be applied to RAE -based models that use semantic representations, such as SigLIP and DINOv2 . This allows for generative detail completion of latent representations that tend to lack low-level appearance while preserving semantic structure.

The training data used included MultiAspect-4K-1M images , rendered PDF data, and internally sourced high-resolution images. After removing low-quality samples using Q-Align , it is reported that 2.6 million high-quality images were used.

The significance of PiD lies in repositioning the decoder, the final stage in image generation, not merely as a restorer, but as a high-resolution module with generation capabilities. Its design, which efficiently creates the overall structure in latent space and synthesizes details in pixel space, is attracting attention as an approach that improves both the processing time and quality of high-resolution image generation.

Related Posts:

May 27, 2026 11:25:00 in AI, Software, Posted by log1i_yk