Surgical Post-Training: Proximal On-Policy Distillation for Reasoning with Knowledge Retention
Signal
78
Hype
25
In three linesSPOT (Surgical Post-Training) is an on-policy distillation framework that injects reasoning capabilities into LLMs while preserving prior knowledge. With 4k rectified math pairs, it improves Qwen3-8B by 6.2% on average in 16 minutes on 8x H800, using KL-constrained reward formulation and minimal-edit error correction pipeline.Read source
Your take?
Summary generated by Claude — human-verified