Reddit r/LocalLLaMA·23 May 2026

meituan-longcat/LongCat-Video-Avatar-1.5 · Hugging Face

Signal

Hype

In three linesMeituan releases LongCat-Video-Avatar 1.5, an open-source framework for audio-driven human avatar video generation. Upgrades audio encoder from Wav2Vec2 to Whisper-Large, supports Audio-Text-to-Video and Video Continuation with 8-step inference. Human evaluation on 508 image-audio pairs across 6 scenarios and 2 languages.

Read source

Your take?

Video generation Open source Benchmarks Voice

Summary generated by Claude — human-verified

meituan-longcat/LongCat-Video-Avatar-1.5 · Hugging Face

Other angles on this story