PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion

ArXi:2605.23902v1 Announce Type: new Most practical high-resolution text-to-image systems, including latent diffusion and autoregressive models, perform generation in a compact latent space, and a decoder maps the generated latents back to pixels. Yet the latent-to-pixel decoder is reconstruction-oriented, optimized to invert the encoder rather than synthesize details, and becomes increasingly costly at megapixel scale. This drawback calls for a expressive and efficient decoding paradigm. Motivated by recent progress in scalable pixel-space diffusion, we.