Back to feed
Reddit r/MachineLearning·

Training GPT-like model on non-language series [R]

Signal
35
Hype
15
In three linesResearcher trains Transformer-decoder models (100M–500M params) on 750M tokens of non-language series. Setup: AdamW, lr=1e-3, batch=4M tokens, 16 layers. Model fails to learn basic auto-regressive behavior and repeatedly generates single token.
Read source
Your take?
GPTCode generationBenchmarks

Summary generated by Claude — human-verified