SENIOR: Efficient Query Selection and Preference-Guided Exploration in Preference-based Reinforcement Learning

ArXi:2506.14648v2 Announce Type: replace-cross Preference-based Reinforcement Learning (PbRL) methods provide a solution to avoid reward engineering by learning reward models based on human preferences. However, poor feedback- and sample- efficiency still remain the problems that hinder the application of PbRL.