Surgical Post-Training: Proximal On-Policy Distillation for Reasoning with Knowledge Retention
Signal
78
Hype
25
In three linesSPOT (Surgical Post-Training) is an on-policy distillation framework that injects reasoning capabilities into LLMs while preserving prior knowledge. With only 4k rectified math pairs, it improves Qwen3-8B by 6.2% on average in 16 minutes on 8x H800 GPUs. The approach uses KL-constrained reward formulation to mitigate catastrophic forgetting.Read source
Your take?
Summary generated by Claude — human-verified