Back to feed
arXiv cs.LG·

Introspective X Training: Feedback Conditioning Improves Scaling Across all LLM Training Stages

Signal
82
Hype
25
In three linesIntrospective Training (IXT) uses a thinking reward model to annotate data with natural language feedback from pre-training onward. On 7.5-12B LLMs trained up to 18T tokens, the method improves compute efficiency by 2.8x and achieves performance levels unattainable otherwise in math and code domains.
Read source
Your take?
Reinforcement learningReasoningCode generationBenchmarks

Summary generated by Claude — human-verified