AGPO: Adaptive Group Policy Optimization with Dual Statistical Feedback

ArXi:2605.20722v1 Announce Type: new Reinforcement learning improves LLM reasoning, but PPO/GRPO typically use fixed clipping and decoding temperature, which makes