arXiv cs.LG·21 May 2026

Introspective X Training: Feedback Conditioning Improves Scaling Across all LLM Training Stages

Signal

Hype

In three linesIntrospective Training (IXT) uses a thinking reward model to annotate data with natural language feedback from pre-training onward. On 7.5-12B LLMs trained up to 18T tokens, the method improves compute efficiency by 2.8x and achieves performance levels unattainable otherwise in math and code domains.

Read source

Your take?

Reinforcement learning Reasoning Code generation Benchmarks

Summary generated by Claude — human-verified

Introspective X Training: Feedback Conditioning Improves Scaling Across all LLM Training Stages

Other angles on this story