Precise: SDE-Consistent Stochastic Sampling for RL Post-Training of Flow-Matching Models

ArXi:2605.23522v1 Announce Type: cross Reinforcement learning (RL) has become an effective way to improve prompt alignment and perceptual quality in diffusion and flow-matching generators. A critical step for applying online RL to flow matching is turning the deterministic sampling trajectory into a stochastic policy, typically by replacing the reverse-time Ordinary Differential Equation (ODE) with a Stochastic Differential Equation