Planner-Centric Reinforcement Learning for Deep Research with Structure-Aware Reward
Signal
75
Hype
25
In three linesDecomposeR, a deep research framework, trains Qwen3-8B in two RL stages: planner RL learns typed DAG structures and query decomposition, then answerer RL learns branch execution and synthesis. Achieves 5.1-8.0 point improvements on long-form benchmarks through explicit planning and structured rewards.Read source
Your take?
Summary generated by Claude — human-verified