arXiv cs.AI·27 May 2026

A Dataset of Robot-Patient and Doctor-Patient Medical Dialogues for Spoken Language Processing Tasks

Signal

Hype

In three linesMeDial-Speech: dataset of 111+ hours of spoken medical dialogues (robot-patient and doctor-patient) covering 4 health conditions. Benchmark of 3 LLMs (GPT-4 mini, DeepSeek-V3, Claude Sonnet 4) via sentence selection: Claude Sonnet 4 achieves 71.1% accuracy. Reveals systematic overconfidence in model predictions.

Read source

Your take?

Benchmarks Claude DeepSeek Voice

Summary generated by Claude — human-verified

A Dataset of Robot-Patient and Doctor-Patient Medical Dialogues for Spoken Language Processing Tasks

Other angles on this story