Reinforcement Learning with Pairwise Preferences in Long-Term Decision Problems

ArXi:2606.00367v1 Announce Type: cross Reinforcement learning problems typically define the goal as maximizing the expected value of a scalar reward function. But, pairwise preferences are often easier to specify than scalar rewards, and they express certain goals that scalar rewards cannot. Methods for reinforcement learning with pairwise preferences have thus received growing interest.