Back to feed
arXiv cs.AI·

Enabling Off-Policy Imitation Learning with Deep Actor Critic Stabilization

Signal
72
Hype
15
In three linesNovel adversarial imitation learning algorithm combining off-policy learning with double Q-network stabilization. Reduces sample inefficiency of GAIL by eliminating on-policy algorithm dependency (TRPO) and reward engineering requirements.
Read source
Your take?
Reinforcement learningAI AgentsPapers

Summary generated by Claude — human-verified