Back to feed
arXiv cs.AI·

A Dataset of Robot-Patient and Doctor-Patient Medical Dialogues for Spoken Language Processing Tasks

Signal
75
Hype
15
In three linesMeDial-Speech: dataset of 111+ hours of spoken medical dialogues (robot-patient and doctor-patient) covering 4 health conditions. Benchmark of 3 LLMs (GPT-4 mini, DeepSeek-V3, Claude Sonnet 4) via sentence selection: Claude Sonnet 4 achieves 71.1% accuracy. Reveals systematic overconfidence in model predictions.
Read source
Your take?
BenchmarksClaudeDeepSeekVoice

Summary generated by Claude — human-verified