Goal-Conditioned Agents that Learn Everything All at Once

ArXi:2605.23551v1 Announce Type: cross A goal-conditioned reinforcement learning agent exploring an environment will see a wealth of information throughout a trajectory, most of which is discarded when only performing on-policy updates with respect to the commanded goal. All-goals learning, where each transition is used for learning off-policy with respect to every goal, allows agents to extract maximal information,. however. it is usually computationally infeasible when done via naive relabelling.