Back to feed
arXiv cs.AI·

MAVEN A Multi-Agent Framework for Multicultural Text-to-Video Generation

Signal
72
Hype
28
In three linesMAVEN is a multi-agent prompt refinement framework improving cultural fidelity in text-to-video generation. It decomposes prompts into person, action, and location dimensions handled by specialized agents. Benchmark of 243 culturally grounded prompts and 972 videos (Chinese, American, Romanian) with CLIP and VLM-as-judge evaluation.
Read source
Your take?
Multi-agentVideo generationBenchmarksPrompt engineering

Summary generated by Claude — human-verified