"The Whole Is Greater Than the Sum of Its Parts": A Compatibility-Aware Multi-Teacher CoT Distillation Framework
Signal
72
Hype
25
In three linesCOMPACT, a multi-teacher CoT distillation framework, adaptively fuses supervisions from multiple LLMs into compact student models. It dynamically weights teacher gradients using three metrics: graph-based consensus, mutual-information-based adaptability, and loss-based difficulty. Achieves SOTA results across benchmarks while mitigating catastrophic forgetting.Read source
Your take?
Summary generated by Claude — human-verified