AI RESEARCH
PR2: Predictive Routing Replay for MoE-Based LLM Reinforcement Learning
arXiv CS.AI
•
ArXi:2606.00395v1 Announce Type: cross Mixture of Experts (MoE) Large Language Models (LLMs) achieve strong performance at scale. However, reinforcement learning (RL) on MoE-based LLMs often suffers from