How small can the orchestration model in an agent be? (separating it from code-gen — that obviously wants a big model)

r/LocalLLaMA
Generative AI Open Source AI

I'm building a local-first agent - a plain ReAct loop (think, pick a tool, observe, repeat) on a llama.cpp backend - and I want to be precise about a question that usually just gets answered with "it depends." It does depend. So let me split it into two jobs: (a) Heavy one-shot generation - write a 400-line module, refactor a big file. That wants a big model, no argument. In my setup I route this to a dedicated coding model; I don't ask the loop model to do it. (b) The orchestration loop itself - read this, decide which tool, call it with the right arguments, look at the result, react.