AI RESEARCH

TAPS: Target-Aware Prefix Tree Selection for Diffusion-Drafted Speculative Decoding

arXiv CS.AI

ArXi:2606.00487v1 Announce Type: new Using a diffusion model for parallel drafting is a promising approach for speculative decoding. By predicting tokens at multiple future positions in a single forward pass, diffusion drafters substantially reduce drafting latency. However, this shifts the bottleneck to verification: verifying a single sequence limits acceptance length, while verifying large draft trees incurs excessive target-model latency.