AI RESEARCH
ECHO: Entropy-Confidence Hybrid Optimization for Test-Time Reinforcement Learning
arXiv CS.AI
•
ArXi:2602.02150v2 Announce Type: replace-cross Test-time reinforcement learning generates multiple candidate answers via repeated rollouts and performs online updates using pseudo-labels constructed by majority voting. To reduce overhead and improve exploration, prior work