Test-Time Graph Search for Goal-Conditioned Reinforcement Learning

ArXi:2510.07257v2 Announce Type: replace Offline goal-conditioned reinforcement learning (GCRL) often struggles with long-horizon tasks, where errors in value estimation accumulate and produce unreliable policies. It is typically assumed that effective long-term planning is infeasible without specialized