EDUCATION & TRAINING
I Thought AI Was Slow Because It Wasn't Smart Enough. Turns Out It's Exhausted From Carrying Things.
Dev.to Machine Learning
About This Tutorial
I've been working on a question lately: can an AI run on a small device without depending on the cloud? I dug through a lot of material, and then one number stopped me cold. A 7B parameter model needs to move roughly 14GB of weight data from memory to the compute unit every time it generates a single token. GPU memory bandwidth is around 2TB/s. Do the math: that's theoretically only 140 tokens per second - and in practice, even less. I sat with that for a moment. It's not that the compute isn't fast enough. It's that the carrying is too slow. This problem has a name: the Memory Wall.