Back to feed
arXiv cs.CL·

Surgical Post-Training: Proximal On-Policy Distillation for Reasoning with Knowledge Retention

Signal
78
Hype
25
In three linesSPOT (Surgical Post-Training) is an on-policy distillation framework that injects reasoning capabilities into LLMs while preserving prior knowledge. With 4k rectified math pairs, it improves Qwen3-8B by 6.2% on average in 16 minutes on 8x H800, using KL-constrained reward formulation and minimal-edit error correction pipeline.
Read source
Your take?
Reinforcement learningReasoningFine-tuningQwen

Summary generated by Claude — human-verified