Back to feed
arXiv cs.CL·

Thoughts-as-Planning: Latent World Models for Chain-of-Thoughts Optimization via Reinforcement Planning

Signal
72
Hype
28
In three linesThoughts-as-Planning formalizes reasoning chain optimization as sequential decision-making over latent semantic space. The framework learns a latent world model simulating effects of reasoning chain edits on outputs, supporting multi-scale edits (token, segment, instruction) via gradient descent or reinforcement learning planning.
Read source
Your take?
ReasoningReinforcement learningPrompt engineeringPapers

Summary generated by Claude — human-verified