AI RESEARCH

TreeFlash: Parallel AR-Approximation for Faster Speculative Decoding

arXiv CS.LG

ArXi:2606.03819v1 Announce Type: new One-shot block drafters for speculative decoding generate the full draft in a single forward pass, achieving strong throughput by eliminating sequential token generation. However, they predict each draft token conditioned only on the prefix context, with no dependence on previously drafted tokens. This non-autoregressive conditioning causes the drafter's distribution to diverge from the verifier's true autoregressive distribution as draft depth grows.