arXiv cs.CL·19 May 2026

Surgical Post-Training: Proximal On-Policy Distillation for Reasoning with Knowledge Retention

Signal

Hype

In three linesSPOT (Surgical Post-Training) is an on-policy distillation framework that injects reasoning capabilities into LLMs while preserving prior knowledge. With 4k rectified math pairs, it improves Qwen3-8B by 6.2% on average in 16 minutes on 8x H800, using KL-constrained reward formulation and minimal-edit error correction pipeline.

Read source

Your take?

Reinforcement learning Reasoning Fine-tuning Qwen

Summary generated by Claude — human-verified

Surgical Post-Training: Proximal On-Policy Distillation for Reasoning with Knowledge Retention

Other angles on this story