The reason small-model agent stacks aren't the default has nothing to do with whether they work

Last June, NVIDIA published a position paper called "Small Language Models are the Future of Agentic AI," and the argument was easy enough to wave off at the time: most of what an agent actually does is unglamorous work like reading input, choosing a tool, calling it, and reshaping the output, none of which needs a 400-billion-parameter model behind it. The proposal was to hand that routine 80% to small specialized models and only fall back to an expensive frontier model when a task genuinely earned it.