AI RESEARCH

RL Excursions during Pre-Training: Re-examining Policy Optimization for LLM training

arXiv CS.LG

ArXi:2606.04272v1 Announce Type: new