Moment Matching Q-Learning
Signal
72
Hype
18
In three linesMoMa QL leverages maximum mean discrepancy (MMD) to accelerate inference of score-based and flow-based generative models in RL. The method guarantees distribution-level convergence and shows superior performance in offline-to-online RL tasks on D4RL benchmarks.Read source
Your take?
Summary generated by Claude — human-verified