AI RESEARCH
Draft-OPD: On-Policy Distillation for Speculative Draft Models
arXiv CS.CL
•
ArXi:2605.29343v2 Announce Type: replace Speculative decoding accelerates large language model inference by pairing a target model with a lightweight draft model whose proposed tokens are verified in parallel. A common way to build draft models, like EAGLE3 or DFlash is supervised fine-tuning (SFT) on target-generated trajectories. However, we observe that SFT quickly plateaus: the draft model's acceptance length on test data stops improving.