AI RESEARCH

Future-KL Regularized GRPO: Process-Level Credit Assignment from $f$-Divergence Regularization

arXiv CS.AI

ArXi:2601.10201v2 Announce Type: replace-cross Group Relative Policy Optimization (GRPO) is widely used for critic-free Large Language Model (LLM) post-