AI RESEARCH
Success Conditioning as Policy Improvement: The Optimization Problem Solved by Imitating Success
arXiv CS.AI
•
ArXi:2601.18175v2 Announce Type: replace A widely used technique for improving policies is success conditioning, in which one collects trajectories, identifies those that achieve a desired outcome, and updates the policy to imitate the actions taken along successful trajectories. This principle appears under many names -- rejection sampling with SFT, goal-conditioned RL, Decision Transformers -- yet what optimization problem it solves, if any, has remained unclear.