100 Trillion+ Pretraining data??? This is the largest data I've see a model being trained on.
r/LocalLLaMA
•
Machine Learning
Open Source AI
AI Research
Usually we see 27-50 Trillion tokens in most models, kimi, mimo, deepseek. They seem to have doubled the pre