Back to feed
arXiv cs.LG·

Moment Matching Q-Learning

Signal
72
Hype
18
In three linesMoMa QL leverages maximum mean discrepancy (MMD) to accelerate inference of score-based and flow-based generative models in RL. The method guarantees distribution-level convergence and shows superior performance in offline-to-online RL tasks on D4RL benchmarks.
Read source
Your take?
Reinforcement learningReasoningBenchmarks

Summary generated by Claude — human-verified