Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning
Signal
72
Hype
18
In three linesProxyCoT, a chain-of-thought fine-tuning method, improves reasoning on long contexts (up to 10M tokens) by transferring reasoning capabilities from short proxy contexts to full contexts via RL/distillation then supervised fine-tuning. Performance gains with reduced computational overhead and cross-domain generalization.Read source
Your take?
Summary generated by Claude — human-verified