Back to feed
Reddit r/LocalLLaMA·

NVIDIA releases Cosmos 3 Omnimodal world modelson HF

Signal
75
Hype
25
In three linesNVIDIA releases Cosmos 3, a collection of omnimodal world models (Nano 16B, Super 64B) capable of generating dynamic video, image, audio, and action commands from text, image, video, and action trajectory inputs. Available on Hugging Face for Physical AI applications.
Read source
Your take?
Video generationImage generationOpen sourceRobotics

Summary generated by Claude — human-verified