What's this sub geebral opinion on quantisizing the KV cache
r/LocalLLaMA
•
Generative AI
Assume I'm talking about Qwen3.6b-27b for coding. I hear a lot about quantisizing the model but almost no opinions on the KV cache for this model. submitted by /u/misanthrophiccunt [link] [comments]