What and When to Distill: Selective Hindsight Distillation for Multi-Turn Agents
Signal
78
Hype
25
In three linesSERL, a selective environment-reweighted learning framework, improves multi-turn LLM agent training by leveraging granular environmental feedback (error messages, page changes, reference trajectories). On ALFWorld and WebShop, SERL achieves 90.0% and 80.1% success rates, outperforming existing RL and distillation baselines.Read source
Your take?
Summary generated by Claude — human-verified