Introspective X Training: Feedback Conditioning Improves Scaling Across all LLM Training Stages
Signal
82
Hype
25
In three linesIntrospective Training (IXT) uses a thinking reward model to annotate data with natural language feedback from pre-training onward. On 7.5-12B LLMs trained up to 18T tokens, the method improves compute efficiency by 2.8x and achieves performance levels unattainable otherwise in math and code domains.Read source
Your take?
Summary generated by Claude — human-verified