AI RESEARCH
Learning What to Learn: Stage-Specific Data Sets for SFT-then-RL in Small Language Model Reasoning
arXiv CS.CL
•
Post-