LLaMA-2 70B Has 64 Query Heads and 8 KV Heads. Here Is the Memory Arithmetic Nobody Shows You.

Towards AI
Generative AI Open Source AI

Every explainer on Grouped Query Attention says the same thing.