$16 refactor, 400 steps, 95% routed to open MoE

r/LocalLLaMA
AI Tools

Got tired of $160 Opus bills so I spent a weekend wiring up a routing layer on vLLM 0.8 (2xA100, enable_auto_tool_choice). Getting the tool call parser to cooperate took longer than the actual routing logic. Once it worked though, easy agent steps go to the 21B active MoE and hard steps get Opus. Hunyuan Hy3 preview handled 380 of 400 steps on a 12k line Python repo at ~$0.02 each ($7.60). Opus covered the remaining 20 at $0.40 ($8), so $15.60 all in. I set reasoning to no_think on routine steps which cut token spend by roughly 30%. Final success rate was 93.4.