ISEP: Implicit Support Expansion for Offline Reinforcement Learning via Stochastic Policy Optimization
Signal
72
Hype
15
In three linesISEP proposes an offline reinforcement learning method that implicitly expands action support by interpolating between in-distribution data and policy samples. A stochastic mechanism alternates between conservative cloning and optimistic expansion signals, implemented via Conditional Flow Matching with classifier-free guidance.Read source
Your take?
Summary generated by Claude — human-verified