Back to feed
arXiv cs.CL·

Taming the Thinker: Conditional Entropy Shaping for Adaptive LLM Reasoning

Signal
72
Hype
28
In three linesConditional Entropy Shaping (CES) dynamically controls token-level entropy to balance reasoning conciseness and accuracy. Implemented on DeepSeek-R1-Distill-7B, CES penalizes high-entropy tokens on correct reasoning paths and rewards them on incorrect paths. Results: improved accuracy with reduced response length across 12 mathematical benchmarks.
Read source
Your take?
DeepSeekReasoningReinforcement learningBenchmarks

Summary generated by Claude — human-verified