Gradient Preconditioning for Efficient and Reliable Reward-Guided Generation

ArXi:2602.08646v2 Announce Type: replace We propose a gradient preconditioning method that makes reward-guided generation with one-step generative models both efficient and reliable. Test-time noise optimization can unlock substantially better reward-guided generations from pretrained generative models, but it is prone to reward hacking that degrades quality and is often too slow for practical use.