AI RESEARCH
Prompt Reinjection: Alleviating Prompt Forgetting in Multimodal Diffusion Transformers
arXiv CS.CV
•
ArXi:2602.06886v3 Announce Type: replace Multimodal Diffusion Transformers (MMDiTs) for text-to-image generation maintain separate text and image branches, with bidirectional information flow between text tokens and visual latents throughout denoising. In this setting, we observe a prompt forgetting phenomenon: the semantics of the prompt representation in the text branch is progressively forgotten as depth increases.