AI RESEARCH

PulseCol: Periodically Refreshed Column-Sparse Attention for Accelerating Diffusion Language Models

arXiv CS.CL

ArXi:2605.20813v1 Announce Type: new Inference in diffusion large language models (dLLMs) is computationally expensive, as full self-attention must be repeatedly executed at each step of the denoising process without KV cache. Recent sparse attention methods for dLLMs mitigate this cost via block-sparse computation, which is applied only in later iterations when model performance is less sensitive to coarse-grained sparse approximation, but yields limited improvements in computational efficiency and acceleration.