arXiv cs.CL·28 May 2026

Unlocking Fine-Grained and Within-Utterance Speaking Style Control in Prompt-Based Text-to-Speech Models

Signal

Hype

In three linesFine-grained style control technique for prompt-based TTS models. Inter-utterance interpolation using direction vectors in embedding space (99-100% gender conversion success, 36 Hz pitch variation). Intra-utterance transitions via KV-cache swapping and sliding-window attention masking (speaker similarity 0.81-0.91).

Read source

Your take?

Voice Prompt engineering Papers

Summary generated by Claude — human-verified

Unlocking Fine-Grained and Within-Utterance Speaking Style Control in Prompt-Based Text-to-Speech Models

Other angles on this story