AI RESEARCH

Accelerating Constrained Decoding with Token Space Compression

arXiv CS.AI

ArXi:2605.29986v1 Announce Type: new To guarantee that an LLM's outputs conform to a specified structure, context-free grammar (CFG) decoding engines force the selection of next tokens that produce strings that conform to a given CFG. While current CFG-constrained decoding engines are highly optimized, the inherent costs arising from the massive per-step search space -- i.e. the entire token vocabulary -- result in intractably high overhead for complex CFGs: precisely the situation where CFG engines are most useful. In this paper, we.