Back to feed
arXiv cs.CL·

Regret Pre-training: Bridging Prior and Posterior Views for Enhanced Knowledge Grounding

Signal
78
Hype
15
In three linesRegret Pre-training introduces a self-supervised framework based on LUPI using dual-view architecture generating Student (causal) and Teacher (future-conditioned) distributions. On OLMoE-1B-7B after 4B tokens, GlobalRegret and LocalRegret achieve 33.9% and 32.2% average accuracy vs 30.2% baseline, with 18.1pp gain on BoolQ. No additional parameters.
Read source
Your take?
PapersReasoningFine-tuning

Summary generated by Claude — human-verified