Reddit r/MachineLearning·9 June 2026

What will be the next breakthrough in ASR? [D]

Signal

Hype

In three linesASR models evolving via supervised learning: Whisper-large-v3 (5M hours) and Nvidia Parakeet v3 (660k hours) lead. New architectures (Transducer, Token-Duration-Transducers, Qwen attention encoder-decoder) replace CTC+self-supervised. Question: will self-supervised methods (Data2Vec2.0, WavLM) disappear for ASR or emerge as a 'Dino moment' in speech?

Read source

Your take?

Voice Benchmarks Open source

Summary generated by Claude — human-verified

What will be the next breakthrough in ASR? [D]

Other angles on this story