Back to feed
arXiv cs.AI·

COOPO: Cyclic Offline-Online Policy Optimization Algorithm

Signal
72
Hype
28
In three linesCOOPO is a hybrid offline-online reinforcement learning algorithm that cycles between KL-regularized offline training and online fine-tuning. Periodic returns to offline training eliminate catastrophic forgetting and distribution drift. On D4RL benchmarks, COOPO reduces online interactions while improving final returns compared to state-of-the-art hybrids.
Read source
Your take?
Reinforcement learningPapersBenchmarks

Summary generated by Claude — human-verified