arXiv cs.CL·29 May 2026

Transcribing Children's Speech: ASR Performance and Obtaining Reliable Orthographic Transcriptions

Signal

Hype

In three linesComparative study of 9 ASR models (Whisper, Parakeet, Wav2Vec2) on child speech in Dutch. Fine-tuned Whisper-medium achieves 5.54% WER on JASMIN and 70.37% on DART. An utterance-level selection method identifies 42% (JASMIN) and 18.1% (DART) of utterances as correctly pronounced with ≥98.3% precision, reducing manual verification needs.

Read source

Your take?

Benchmarks Voice Evals

Summary generated by Claude — human-verified

Transcribing Children's Speech: ASR Performance and Obtaining Reliable Orthographic Transcriptions

Other angles on this story