AI RESEARCH

D^2SD: Accelerating Speculative Decoding with Dual Diffusion Draft Models

arXiv CS.LG

ArXi:2606.04446v1 Announce Type: cross Speculative decoding accelerates autoregressive large language model inference by drafting multiple tokens and verifying them in a single target-model forward pass. Recent diffusion-based drafters generate an entire block of tokens in parallel but usually commit to a single draft sequence per verification: once the first mismatch occurs, all subsequent draft tokens are discarded, resulting in a limited acceptance rate.