AI RESEARCH

LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding

arXiv CS.LG

ArXi:2602.23881v2 Announce Type: replace Speculative decoding accelerates autoregressive large language model (LLM) inference by using a lightweight draft model to propose candidate tokens that are then verified in parallel by the target model. The speedup is significantly determined by the acceptance rate, yet standard