Back to feed
arXiv cs.CL·

MixSD: Mixed Contextual Self-Distillation for Knowledge Injection

Signal
78
Hype
15
In three linesMixSD is an external-teacher-free fine-tuning method that injects knowledge by dynamically mixing tokens from two model conditionals: an expert branch observing the injected fact, and a naive branch reflecting original priors. On QA and knowledge-editing benchmarks, MixSD retains up to 100% of base model capabilities versus 1% for standard SFT.
Read source
Your take?
Fine-tuningReasoningPapers

Summary generated by Claude — human-verified