Beyond VLM-Based Rewards: Diffusion-Native Latent Reward Modeling

ArXi:2602.11146v2 Announce Type: replace-cross Preference optimization for diffusion and flow-matching models relies on reward functions that are both discriminatively robust and computationally efficient. Vision-Language Models (VLMs) have emerged as the primary reward provider, leveraging their rich multimodal priors to guide alignment. However, their computation and memory cost can be substantial, and optimizing a latent diffusion generator through a pixel-space reward