AI RESEARCH

MarginGate: Sparse Margin-Triggered Verification for Batch-Invariant LLM Inference

arXiv CS.LG

ArXi:2605.30218v1 Announce Type: new Temperature-zero BF16 LLM inference is often treated as reproducible, yet the same request can emit different tokens when decoded alone or inside a larger batch. Existing fixes use batch-invariant operators or LLM-42's per-token verification, incurring cost even when most steps are stable. We ask whether verification can be applied exclusively to flipped tokens.