AI RESEARCH
Completion vs Optimality: Policy Gradient in Long-Horizon Cumulative-Damage Problems
arXiv CS.AI
•
ArXi:2605.26657v1 Announce Type: new Long-horizon decision problems with cumulative damage couple locally attractive actions to globally adverse outcomes. We identify two orthogonal failure modes for policy-gradient methods on this class and propose a decomposition that separates them: \emph{completion} (reaching the terminal horizon rather than exiting via an implicit terminal constraint) and \emph{optimality} (matching the dynamic-programming reference given completion