Q4_K_M is fine for chat and a trap for agents. Here is math mathing.

r/LocalLLaMA
Generative AI

Saw the Q4_K_M vs Q6 thread earlier and the comments are talking past each other. "few errors per hour" vs "errors every couple days" sounds like a 24x difference. for chat thats fine. for agentic loops thats the whole game. run the math. if your agent does a 30-step tool calling loop and each step has a 2% chance of producing a malformed arg or picking the wrong tool, end-to-end success is 0.98^30 = 0.54. coin flip. at Q4_K_M with "few errors per hour" the per-call malformation rate is probably ~3%. 30 steps = 40% completion. at Q6 with "errors every couple days" call it 0.3.