Learning to Reason Efficiently with Discounted Reinforcement Learning

ArXi:2510.23486v2 Announce Type: replace Large reasoning models (LRMs) often consume excessive tokens, inflating computational cost and latency. broadly, in goal reaching sequential decision problems we often want to reach the goal quickly, and LRM reasoning can be viewed through this lens. We challenge the assumption that longer responses improve accuracy.