Multi-SPIN: Multi-Access Speculative Inference for Cooperative Token Generation at the Edge

ArXi:2606.04581v1 Announce Type: cross Speculative inference (SPIN) was originally developed as an efficient architecture to accelerate Large Language Models (LLMs). In this work, we propose its distributed deployment to enable cooperative token generation in a multiuser edge system; its advantage is to effectively balance computational loads between resource-constrained devices and servers.