AI RESEARCH
AGPO: Adaptive Group Policy Optimization with Dual Statistical Feedback
arXiv CS.LG
•
ArXi:2605.20722v1 Announce Type: new Reinforcement learning improves LLM reasoning, but PPO/GRPO typically use fixed clipping and decoding temperature, which makes