Return-to-Go Is More Than a Number: Q-Guided Alignment for Return-Conditioned Supervised Learning

ArXi:2605.29028v1 Announce Type: cross Conditioned Sequence Models (CSMs) learn policies by treating return-to-go (RTG) as a control signal. However, existing CSMs often treat the RTGs as simple numerical inputs rather than aligning them with the performance of their policies. In this paper, we propose Q-ALIGN DT, a framework that enforces this alignment by ensuring the $Q$-value of the output policy is consistent with the input