When Cloud Agents Meet Device Agents: Lessons from Hybrid Multi-Agent Systems

ArXi:2605.30102v1 Announce Type: cross The design space of agentic AI inference spans two extremes: frontier large language models (LLMs), typically hosted in the cloud and offering strong performance across a wide range of tasks at substantially high cost, and cost-efficient small language models (SLMs), which are amenable to on-device inference. Hybrid multi-agent systems (MASs) combining on-device and cloud models offer a promising middle ground, but they also