arXiv cs.CL·3 June 2026

Regret Pre-training: Bridging Prior and Posterior Views for Enhanced Knowledge Grounding

Signal

Hype

In three linesRegret Pre-training introduces a self-supervised framework based on LUPI using dual-view architecture generating Student (causal) and Teacher (future-conditioned) distributions. On OLMoE-1B-7B after 4B tokens, GlobalRegret and LocalRegret achieve 33.9% and 32.2% average accuracy vs 30.2% baseline, with 18.1pp gain on BoolQ. No additional parameters.

Read source

Your take?

Papers Reasoning Fine-tuning

Summary generated by Claude — human-verified

Regret Pre-training: Bridging Prior and Posterior Views for Enhanced Knowledge Grounding

Other angles on this story