Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models

ArXi:2510.11683v3 Announce Type: replace-cross A key challenge in applying reinforcement learning (RL) to diffusion large language models (dLLMs) is the intractability of their likelihood functions, which are essential for the RL objective, necessitating corresponding approximation during