AI RESEARCH
Future-KL Regularized GRPO: Process-Level Credit Assignment from $f$-Divergence Regularization
arXiv CS.AI
•
ArXi:2601.10201v2 Announce Type: replace-cross Group Relative Policy Optimization (GRPO) is widely used for critic-free Large Language Model (LLM) post-