AI RESEARCH
$M^3$ Scaling Law: Optimizing Multi-Epoch, Multi-Lingual, and Multi-Stage Training for Low-Resource Language Models
arXiv CS.CL
•
ArXi:2410.12325v2 Announce Type: replace In this paper, we study a fundamental design problem in pre