AI RESEARCH
RL Excursions during Pre-Training: Re-examining Policy Optimization for LLM training
arXiv CS.LG
•
ArXi:2606.04272v1 Announce Type: new