Speculative Decoding for 2x Faster Whisper Inference
Hugging Face implements speculative decoding to accelerate Whisper inference by 2x. The technique uses a lightweight model to generate candidate tokens, validated by the full model in parallel, reducing latency without quality loss.