AI RESEARCH
Asymmetric Scaling Laws from Sparse Features
arXiv CS.LG
•
We introduce a model for neural scaling laws under sparse activations. In the model, test loss is often dominated by rare coordinates that are never observed in the training input. This mechanism induces a novel bottleneck absent from dense models.