arXiv cs.CL·26 May 2026

Raon-Speech Technical Report

Signal

Hype

In three linesRaon-Speech is a 9B multilingual speech language model (English/Korean) that understands and generates speech while preserving text capabilities. Trained on 1.38M hours of curated data, it outperforms 8 comparable audio models (Qwen2.5-Omni, Fun-Audio-Chat) across 42 benchmarks. Raon-SpeechChat extends it with real-time full-duplex conversation trained on 119K hours of dialogue.

Read source

Your take?

Voice Benchmarks Open source Multi-agent

Summary generated by Claude — human-verified

Raon-Speech Technical Report

Other angles on this story