arXiv cs.CL·21 May 2026

Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning

Signal

Hype

In three linesProxyCoT, a chain-of-thought fine-tuning method, improves reasoning on long contexts (up to 10M tokens) by transferring reasoning capabilities from short proxy contexts to full contexts via RL/distillation then supervised fine-tuning. Performance gains with reduced computational overhead and cross-domain generalization.

Read source

Your take?

Reasoning Fine-tuning Reinforcement learning Prompt engineering

Summary generated by Claude — human-verified

Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning

Other angles on this story