Back to feed
arXiv cs.AI·

What and When to Distill: Selective Hindsight Distillation for Multi-Turn Agents

Signal
78
Hype
25
In three linesSERL, a selective environment-reweighted learning framework, improves multi-turn LLM agent training by leveraging granular environmental feedback (error messages, page changes, reference trajectories). On ALFWorld and WebShop, SERL achieves 90.0% and 80.1% success rates, outperforming existing RL and distillation baselines.
Read source
Your take?
AI AgentsReinforcement learningReasoning

Summary generated by Claude — human-verified