Back to feed
arXiv cs.CL·

Self-Distilled Trajectory-Aware Boltzmann Modeling: Bridging the Training-Inference Discrepancy in Diffusion Language Models

Signal
75
Hype
15
In three linesTABOM, a post-training method for Diffusion Language Models, aligns optimization with the multi-step easy-to-hard decoding trajectory observed at inference. Via Boltzmann modeling of unmasking preferences, it derives a tractable pairwise ranking objective that reduces training-inference discrepancy and improves performance on new domains.
Read source
Your take?
Fine-tuningReasoningPapers

Summary generated by Claude — human-verified