FullFlow: Upgrading Text-to-Image Flow Matching Models for Bidirectional Vision--Language Generation

ArXi:2605.20316v1 Announce Type: cross Modern text-to-image diffusion models encode rich visual priors, but expose them only through one-way text-conditioned generation. Existing unified vision--language models derived from them recover bidirectional capability through large-scale joint pre