Back to feed
arXiv cs.AI·

Rethinking Literature Search Evaluation: Deep Research Helps, and Human Citation Lists Are Not a Ground Truth

Signal
75
Hype
15
In three linesLarge-scale literature search study: Deep Research pipeline increases recall from below 20% to above 80% on RollingEval-Jun25 (250-paper benchmark). Critical analysis of human reference lists as ground truth: only 51% judged moderately relevant vs 86-88% for best AI re-rankers. Humans cite direct collaborators 2.5x more often.
Read source
Your take?
RAGEvalsBenchmarks

Summary generated by Claude — human-verified