Experience-Driven Dynamic Exits for LLMs with Reinforcement Learning

ArXi:2606.03113v1 Announce Type: new Large Language Models suffer from slow autoregressive inference. While self-speculative decoding accelerates this process, its efficiency is hampered by static configurations like fixed exit layers and speculation lengths. We reframe this optimization as a \textbf{Marko Decision Process} and propose \textbf{LEDE}, a framework that uses offline reinforcement learning.