Back to feed
arXiv cs.CL·

Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning

Signal
72
Hype
18
In three linesProxyCoT, a chain-of-thought fine-tuning method, improves reasoning on long contexts (up to 10M tokens) by transferring reasoning capabilities from short proxy contexts to full contexts via RL/distillation then supervised fine-tuning. Performance gains with reduced computational overhead and cross-domain generalization.
Read source
Your take?
ReasoningFine-tuningReinforcement learningPrompt engineering

Summary generated by Claude — human-verified