arXiv cs.AI·19 May 2026

SD-Search: On-Policy Hindsight Self-Distillation for Search-Augmented Reasoning

Signal

Hype

In three linesSD-Search introduces on-policy hindsight self-distillation for search-augmented reasoning agents. A single model acts as both student and teacher: the teacher, conditioned on past rollout outcomes, guides the student via token-level Jensen-Shannon divergence at query positions. No external teacher model or additional annotations needed.

Read source

Your take?

Reasoning Reinforcement learning RAG

Summary generated by Claude — human-verified

SD-Search: On-Policy Hindsight Self-Distillation for Search-Augmented Reasoning

Other angles on this story