SPADER: Step-wise Peer Advantage with Diversity-Aware Exploration Rewards for Multi-Answer Question Answering
Signal
78
Hype
18
In three linesSPADER is an RL framework for tool-augmented LLM agents in Multi-Answer QA. It introduces Step-wise Peer Advantage (SPA) for fine-grained credit assignment over long trajectories, and a diversity-aware exploration reward promoting rare entity discovery. Evaluated on QAMPARI, Mintaka, WebQSP, QUEST: improves recall and F1 vs prompting and supervised RL baselines.Read source
Your take?
Summary generated by Claude — human-verified