Value Flows

ArXi:2510.07650v4 Announce Type: replace-cross While most reinforcement learning methods today flatten the distribution of future returns to a single scalar value, distributional RL methods exploit the return distribution to provide stronger learning signals and to enable applications in exploration and safe RL.