EfficientTDMPC: Improved MPC Objectives for Sample-Efficient Continuous Control
Signal
72
Hype
18
In three linesEfficientTDMPC improves sample efficiency for continuous control in model-based reinforcement learning. The method uses an ensemble of dynamics models, averages return estimates across multiple rollout depths, and adds an uncertainty penalty to the planner objective. It achieves SOTA on HumanoidBench-Hard and DMC hard in low-data regimes.Read source
Your take?
Summary generated by Claude — human-verified