One-Step Bellman Alignment Enables Provably Efficient Transfer in Online RL

ArXi:2601.21924v2 Announce Type: replace We study online transfer reinforcement learning (RL) in episodic Marko decision processes, where experience from related source tasks is available during learning on a target task. A fundamental difficulty is that task similarity is typically defined in terms of rewards or transitions, whereas online RL algorithms operate on Bellman regression targets.