Back to feed
arXiv cs.CL·

Direct Preference Optimization for English-Mandarin Code-Switching Speech Recognition in Audio LLMs

Signal
78
Hype
15
In three linesResearchers apply Direct Preference Optimization (DPO) to improve English-Mandarin code-switching transcription in Audio LLMs. Three failure modes identified: language omission, translation-instead-of-transcription, hallucination. Training on 100K pairs (570 hours) reduces MER up to 89.6% (in-distribution) and 20.0% (out-of-distribution).
Read source
Your take?
Reinforcement learningAlignmentVoiceBenchmarksPapers

Summary generated by Claude — human-verified