Regret Pre-training: Bridging Prior and Posterior Views for Enhanced Knowledge Grounding
Signal
78
Hype
15
In three linesRegret Pre-training introduces a self-supervised framework based on LUPI using dual-view architecture generating Student (causal) and Teacher (future-conditioned) distributions. On OLMoE-1B-7B after 4B tokens, GlobalRegret and LocalRegret achieve 33.9% and 32.2% average accuracy vs 30.2% baseline, with 18.1pp gain on BoolQ. No additional parameters.Read source
Your take?
Summary generated by Claude — human-verified