Rethinking Cross-Layer Information Routing in Diffusion Transformers

ArXi:2605.20708v1 Announce Type: new Diffusion Transformers (DiTs) have become a de facto backbone of modern visual generation, and nearly every major axis of their design -- tokenization, attention, conditioning, objectives, and latent autoencoders -- has been extensively revisited. The residual stream that governs how information accumulates across layers, however, has been directly inherited from the original Transformer.