Prompt Reinjection: Alleviating Prompt Forgetting in Multimodal Diffusion Transformers

ArXi:2602.06886v3 Announce Type: replace Multimodal Diffusion Transformers (MMDiTs) for text-to-image generation maintain separate text and image branches, with bidirectional information flow between text tokens and visual latents throughout denoising. In this setting, we observe a prompt forgetting phenomenon: the semantics of the prompt representation in the text branch is progressively forgotten as depth increases.