Cactus Hybrid Router: Gemma4-2B can match Gemini-3.1-Flash-Lite by routing 15-55% of tasks to Gemini And Running The Rest Locally.
r/LocalLLaMA
•
Generative AI
Last week, we announced the “Simple Attention Network” and trained Needle, a 26m function call model that beats models 10-25x its size. Some LocalLlama Redditors asked if we could use make a router model. We now built “Cactus Hybrid Router”, a 65k parameter model that decodes on the fly when to complete a task with the edge model or route to frontier cloud. Robust router performance, even when you quantize the edge model. This is Cactus Quants though, our 4bit uniform nears fp16 naturally.