Smaller Models, Same Old Story: Edge AI's Incremental Tailwind

What Happened Researchers introduced QuBLAST, a post-training quantization framework that shrinks large language models 40–45% by applying mixed precision across network blocks and scaling activations to tame outliers. Tested on Qwen3-8B, Llama3-8B, Mistral, and Falcon-H1, it held perplexity degradation under 5% — and notably extended coverage to state-space models, which most quantization work ignores. It's solid engineering in an already-crowded field. Who Gets Hit The thesis is structural, not stock-specific...