Back to feed
Reddit r/LocalLLaMA·

meituan-longcat/LongCat-Video-Avatar-1.5 · Hugging Face

Signal
75
Hype
35
In three linesMeituan releases LongCat-Video-Avatar 1.5, an open-source framework for audio-driven human avatar video generation. Upgrades audio encoder from Wav2Vec2 to Whisper-Large, supports Audio-Text-to-Video and Video Continuation with 8-step inference. Human evaluation on 508 image-audio pairs across 6 scenarios and 2 languages.
Read source
Your take?
Video generationOpen sourceBenchmarksVoice

Summary generated by Claude — human-verified