RiT: Vanilla Diffusion Transformers Suffice in Representation Space

ArXi:2605.21981v1 Announce Type: new Flow matching with $x$-prediction -- regressing the clean data point rather than the ambient velocity -- is known to exploit low-dimensional manifold structure effectively in pixel space \cite{li2025back}. We ask whether a pretrained representation space, while containing a low-dimensional data manifold of comparable intrinsic dimensionality, offers a distribution favorable for flow-matching learning.