AI RESEARCH
Beyond Pairwise Preferences: Listwise Reward-Aware Alignment for Diffusion Models
arXiv CS.LG
•
ArXi:2605.26491v1 Announce Type: new Preference optimization has emerged as an efficient alternative to online reinforcement learning from human feedback (RLHF) for aligning text-to-image diffusion models. However, existing methods largely reduce supervision to binary pairwise comparisons. This pairwise reduction is limiting when