arXiv cs.CL·19 May 2026

Self-Distilled Trajectory-Aware Boltzmann Modeling: Bridging the Training-Inference Discrepancy in Diffusion Language Models

Signal

Hype

In three linesTABOM, a post-training method for Diffusion Language Models, aligns optimization with the multi-step easy-to-hard decoding trajectory observed at inference. Via Boltzmann modeling of unmasking preferences, it derives a tractable pairwise ranking objective that reduces training-inference discrepancy and improves performance on new domains.

Read source

Your take?

Fine-tuning Reasoning Papers

Summary generated by Claude — human-verified

Self-Distilled Trajectory-Aware Boltzmann Modeling: Bridging the Training-Inference Discrepancy in Diffusion Language Models

Other angles on this story