A Dataset of Robot-Patient and Doctor-Patient Medical Dialogues for Spoken Language Processing Tasks
Signal
75
Hype
15
In three linesMeDial-Speech: dataset of 111+ hours of spoken medical dialogues (robot-patient and doctor-patient) covering 4 health conditions. Benchmark of 3 LLMs (GPT-4 mini, DeepSeek-V3, Claude Sonnet 4) via sentence selection: Claude Sonnet 4 achieves 71.1% accuracy. Reveals systematic overconfidence in model predictions.Read source
Your take?
Summary generated by Claude — human-verified