ProRL: Effective Reinforcement Learning for Proactive Recommendation via Rectified Policy Gradient Estimation

ArXi:2605.28293v1 Announce Type: cross Proactive Recommender Systems (PRSs) aim to guide user preference shift toward target items by generating paths of intermediate recommendations. Reinforcement learning (RL) provides a principled framework for optimizing such sequential decision tasks, as path rewards can naturally capture both short-term acceptance and long-term guidance effectiveness. However, naively applying policy gradients to PRS results in deficient gradient estimation.