From Correctness to Preference: A Framework for Personalized Agentic Reinforcement Learning

ArXi:2605.23382v1 Announce Type: new Agentic reinforcement learning (Agentic RL) has achieved strong progress in tasks with clear success signals. However, many real-world agent applications require user-conditioned behavior: the same query may call for different planning strategies and tool-use decisions across users. This setting raises key challenges: generic rewards cannot capture heterogeneous user preferences, observed behaviors are entangled with conformity effects, and flat memories cannot personalized skill retrieval.