LLaMA-2 70B Has 64 Query Heads and 8 KV Heads. Here Is the Memory Arithmetic Nobody Shows You.
Towards AI
•
Generative AI
Open Source AI
Every explainer on Grouped Query Attention says the same thing.