Back to feed
arXiv cs.AI·

Skill-Augmented AI Agents for Medical Research Analysis: An Exploratory Multi-Model Human Evaluation in an NSCLC Transcriptomic Biomarker Task

Signal
45
Hype
25
In three linesExploratory study comparing AI agents with autonomous access to medical research skills versus native models on NSCLC transcriptomic biomarker analysis. Six model backbones tested, 21 outputs evaluated by experts and non-experts. Skill-augmented outputs show directional quality improvement (5.50 vs 5.11) but not statistically significant (p=0.156). Expert agreement limited (ICC=-0.15).
Read source
Your take?
AI AgentsBenchmarksEvalsAI safety

Summary generated by Claude — human-verified