AI RESEARCH
DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards
arXiv CS.LG
•
ArXi:2605.21467v1 Announce Type: new Reinforcement learning from verifiable rewards (RLVR) has emerged as a central technique for improving the reasoning capabilities of large language models. Despite its effectiveness, how response-level rewards translate into token-level probability changes remains poorly understood. We