Back to feed
arXiv cs.CL·

"The Whole Is Greater Than the Sum of Its Parts": A Compatibility-Aware Multi-Teacher CoT Distillation Framework

Signal
72
Hype
25
In three linesCOMPACT, a multi-teacher CoT distillation framework, adaptively fuses supervisions from multiple LLMs into compact student models. It dynamically weights teacher gradients using three metrics: graph-based consensus, mutual-information-based adaptability, and loss-based difficulty. Achieves SOTA results across benchmarks while mitigating catastrophic forgetting.
Read source
Your take?
ReasoningFine-tuningPapers

Summary generated by Claude — human-verified