LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws

ArXi:2502.12120v3 Announce Type: replace Scaling laws guide the development of large language models (LLMs) by offering estimates for the optimal balance of model size, tokens, and compute. recently, loss-to-loss scaling laws that relate losses across pre