100 Trillion+ Pretraining data??? This is the largest data I've see a model being trained on.
Signal
35
Hype
55
In three linesA Reddit user reports a model (likely Minimax M3) trained on 100+ trillion tokens, double current standards (27-50T for Kimi, Mimo, Deepseek). Author doubts the model exceeds 500B parameters despite this massive data scaling.Read source
Your take?
Summary generated by Claude — human-verified