Back to feed
Reddit r/LocalLLaMA·

100 Trillion+ Pretraining data??? This is the largest data I've see a model being trained on.

Signal
35
Hype
55
In three linesA Reddit user reports a model (likely Minimax M3) trained on 100+ trillion tokens, double current standards (27-50T for Kimi, Mimo, Deepseek). Author doubts the model exceeds 500B parameters despite this massive data scaling.
Read source
Your take?
DeepSeekBenchmarks

Summary generated by Claude — human-verified