Inverse Reinforcement Learning without an Optimal Demonstrator: A Feasible Reward Set Approach

ArXi:2605.30903v1 Announce Type: cross Inverse reinforcement learning (IRL) typically assumes nstrations from a single optimal nstrator, but in many applications data come from multiple imperfect nstrators with heterogeneous suboptimality levels. We study reward learning in this setting through a feasible-reward-set framework: for each nstrator, we encode its declared suboptimality level as a linear constraint and intersect the resulting feasible sets across nstrators.