arXiv cs.CL·28 May 2026

Bridging the Stability-Expressivity Gap: Synthetic Data Scaling and Preference Alignment for Low-Resource Spoken Language Models

Signal

Hype

In three linesSpoken Language Models (SLMs) for speech synthesis in low-resource languages face a trade-off: synthetic data improves phonetic accuracy but suppresses prosodic variability (Synthetic Erosion). Authors propose two self-alignment frameworks (DGSA and TDSC) to recover expressivity, outperforming ElevenLabs and Gemini Pro, enabling zero-shot voice cloning for Lao.

Read source

Your take?

Voice Papers Reasoning Alignment

Summary generated by Claude — human-verified

Bridging the Stability-Expressivity Gap: Synthetic Data Scaling and Preference Alignment for Low-Resource Spoken Language Models

Other angles on this story