arXiv cs.CL·20 May 2026

FormalASR: End-to-End Spoken Chinese to Formal Text

Signal

Hype

In three linesFormalASR introduces two compact models (0.6B and 1.7B parameters) that directly transcribe spoken Chinese into formal written text without an ASR+LLM pipeline. Fine-tuned on WenetSpeech-Formal and Speechio-Formal using supervised fine-tuning of Qwen3-ASR, they achieve 37.4% relative CER reduction over verbatim baselines and improve ROUGE-L and BERTScore.

Read source

Your take?

Qwen Code generation Fine-tuning Benchmarks

Summary generated by Claude — human-verified

FormalASR: End-to-End Spoken Chinese to Formal Text

Other angles on this story