Performance When Offloading Large Models to System RAM?

r/LocalLLaMA
AI Hardware

I noticed for people running large models, or those that would be cost prohibitive to have all in GPU VRAM, I noticed that the dominate strategy is one GPU with a large pool of system DRAM to offload the weights, as per GB VRAM is always expensive than normal DDR5.