Topic

#Video generation

AI video generation refers to the automatic creation of animated sequences from text, images, or audio inputs. Sora (OpenAI) is a notable example of a model that produces realistic clips from a plain text description.

40Articles

14Sources

62Avg. signal

Reddit r/LocalLLaMA·Jun 17

GLM 5.2 Release Video [Made with GLM 5.2]

GLM 5.2 generates videos via Remotion, comparable to Fable but below Gemini 3.1 Pro. Server overload observed on OpenRouter with timeouts on long outputs.

Video generation Gemini Qwen

SIG

HYP

GitHub Trending·Jun 17

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> calesthio /</span> OpenMontage

OpenMontage is an open-source, agentic video production system with 12 pipelines, 52 tools, and 500+ agent skills. Converts an AI coding assistant into a full video production studio.

AI Agents Multi-agent Video generation

SIG

HYP

GitHub Trending·Jun 17

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> calesthio /</span> OpenMontage

OpenMontage is an open-source, agentic video production system with 12 pipelines, 52 tools, and 500+ agent skills. Converts an AI coding assistant into a full video production studio.

AI Agents Multi-agent Video generation

SIG

HYP

Le Big Data·Jun 17

Grok Imagine Video 1.5 : cette IA génère maintenant des vidéos avec le son

xAI makes Grok Imagine Video 1.5 accessible, its video generation model now capable of producing videos with synchronized audio.

Video generation

SIG

HYP

Reddit r/MachineLearning·Jun 17

Mel AI just shared a demo of video-native AI characters that can talk, react, and respond to camera context in real time [N]

Mel AI demonstrates video-native AI characters that talk, lip-sync, show facial reactions, and respond in real time to camera context. The system detects user environment and adapts responses accordingly. This approach moves beyond text-based Character AI (founded by former Google/LaMDA developers).

AI Agents Vision Voice

SIG

HYP

The Decoder·Jun 14

Microsoft Research's Mirage gives video generation a persistent spatial memory that doesn't forget what's around the corner

Mirage, a video world model from Microsoft Research, stores scene information in latent space instead of pixel-based point clouds. This cuts compute time and graphics memory while maintaining spatial consistency through long camera moves. Object tracking across segments remains unreliable.

Video generation Papers DeepMind

SIG

HYP

arXiv cs.CL·Jun 12

Helping Figures Tell their Story! Paper-Grounded Video Generation Explaining Complex Scientific Figures

New arXiv paper introducing MINARD, a video generation system that transforms scientific figures into narrated walkthrough videos with region grounding. The pipeline generates paper-grounded narrations and sequentially aligns them to figure regions. Includes FigTalk benchmark with component-level grounding metrics.

Video generation Vision Benchmarks

SIG

HYP

Reddit r/MachineLearning·Jun 11

Adaptive Tokenisation Via Temporal Redundancy Masking And Latent Inpainting [R]

Adaptive video tokenisation method exploiting temporal redundancy in frozen tokeniser latent space via fixed threshold on per-position temporal-L1 differences. Latent Inpainting Transformer (LIT) reconstructs dropped positions. Single encoder + one LIT pass pipeline: 31× speedup over ElasticTok-CV, 2× over InfoTok on TokenBench and DAVIS benchmarks.

Video generation Benchmarks Papers

SIG

HYP

Google DeepMind·Jun 9

Introducing Gemma 4 12B: a unified, encoder-free multimodal model

Google DeepMind releases Gemma 4 12B, a unified encoder-free multimodal model. The model processes text, images, and video in a single architecture, optimized for on-device inference.

Gemini DeepMind Vision

SIG

HYP

Hugging Face Blog·Jun 7

Room360: Video-to-3D Spatial Reconstruction Platform

Hugging Face introduces Room360, a video-to-3D spatial reconstruction platform. The tool converts video sequences into usable 3D models for immersive and architectural applications.

Video generation Tools Open source

SIG

HYP

Hugging Face Blog·Jun 4

Nemotron 3.5 Content Safety: Customizable Multimodal Safety for Global Enterprise AI

NVIDIA releases Nemotron 3.5 Content Safety, an open-source multimodal safety model detecting harmful content across text, image, and video. Customizable for global enterprises, it provides granular policy control for moderation across regions and use cases.

AI safety Vision Video generation

SIG

HYP

GitHub Trending·Jun 4

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> NVIDIA /</span> cosmos

NVIDIA releases Cosmos, an open platform of world models, datasets, and tools enabling developers to build Physical AI for robots, autonomous vehicles, smart infrastructure, and more.

Robotics Open source Video generation

SIG

HYP

GitHub Trending·Jun 4

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> AIDC-AI /</span> Pixelle-Video

Pixelle-Video is an AI-powered automated short video generation engine. The GitHub project provides an end-to-end solution for creating short videos without manual intervention.

Video generation Open source

SIG

HYP

The Decoder·Jun 4

xAI updates Grok Imagine to 1.5 with image-to-video generation at 720p resolution

xAI releases grok-imagine-video-1.5-preview, an image-to-video model generating cinematic videos up to 720p from still images and text prompts. Multiple clips can be stitched together into longer scenes.

Video generation Image generation

SIG

HYP

Vercel AI Blog·Jun 3

Grok Imagine Video 1.5 on AI Gateway

Grok Imagine Video 1.5 from xAI is now available on AI Gateway. The model generates video from input image with synchronized audio in single pass. Improvements: audio quality, prompt following, photorealism, character consistency across longer sequences, expanded reference image support for visual style control.

Video generation Tools Infrastructure

SIG

HYP

Reddit r/LocalLLaMA·Jun 2

NVIDIA releases Cosmos 3 Omnimodal world modelson HF

NVIDIA releases Cosmos 3, a collection of omnimodal world models (Nano 16B, Super 64B) capable of generating dynamic video, image, audio, and action commands from text, image, video, and action trajectory inputs. Available on Hugging Face for Physical AI applications.

Video generation Image generation Open source

SIG

HYP

arXiv cs.LG·Jun 2

Geometric Erasure by Contrastive Velocity Matching in Rectified Flows

GEM is a concept erasure framework for Rectified Flow Transformers. It bridges trajectory-based unlearning (Generative Flow Networks) and teacher-guided erasure, using geometric guidance signals to suppress unwanted concepts while preserving benign generation and preventing harmful content synthesis.

AI safety Alignment Papers

SIG

HYP

Latent Space·Jun 2

[AINews] NVIDIA Cosmos 3, Nemotron 3 Ultra, and RTX Spark

NVIDIA announces Cosmos 3 (video model), Nemotron 3 Ultra (optimized LLM), and RTX Spark. Jensen Huang claims a major win for the company.

Video generation Code generation Infrastructure

SIG

HYP

Latent Space·Jun 1

Why Video Agent models are next — Ethan He, xAI Grok Imagine Lead

Ethan He, Grok Imagine lead at xAI, discusses building the video generation model in 3 months, compares video generation to world models approach, and argues why video agent models represent the next frontier.

Video generation AI Agents Reasoning

SIG

HYP

The Decoder·Jun 1

Nvidia bets big on physical AI at GTC Taipei with a new world model, driving brain, and open humanoid robot

Nvidia launches three physical AI models at GTC Taipei: Cosmos 3 (world model), Alpamayo 2 Super (scaled-up autonomous driving model), and an open reference platform for humanoid robots.

Robotics Video generation Open source

SIG

HYP

The Decoder·May 29

Google fixes several bugs in Gemini usage limits that burned through quotas too fast

Google fixes bugs in Gemini usage limits where a single Omni video consumed entire quotas. Ultra members now get twice as many video generations, failed requests are no longer charged, and Google plans increased transparency on usage.

Gemini Video generation

SIG

HYP

The Decoder·May 28

Amazon builds its own AI production platform and greenlights three AI animated series for Prime Video

Amazon MGM Studios and AWS launch a creators' fund and in-house AI platform called 'Project Nara'. Three animated series are in production with five-week timelines for pilots. Amazon claims the only end-to-end AI content ecosystem in the industry.

Image generation Video generation Business

SIG

HYP

Reddit r/MachineLearning·May 28

Diffusion models for sketch-guided trajectory simulation [R]

Diffusion models applied to basketball trajectory simulation conditioned on partial sketches of player movements. The model jointly refines all player trajectories, producing more natural simulations than autoregressive generation. Code and model fully open-sourced.

Video generation Open source

SIG

HYP

The Decoder·May 27

YouTube will try to automatically flag AI videos starting this month

YouTube deploys automatic detection system to flag AI-generated or heavily AI-altered content starting May 2026. Labels will display more prominently: below player for long videos and as overlay on Shorts. Recommendations and monetization unaffected.

Regulation Video generation

SIG

HYP

Le Big Data·May 27

Vidéos IA : YouTube va enfin arrêter de les cacher avec des labels bien visibles

YouTube enforces visible labels to identify AI-generated videos. This measure aims to improve transparency and help users distinguish authentic content from synthetic content.

Video generation Regulation AI safety

SIG

HYP

GitHub Trending·May 27

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> harry0703 /</span> MoneyPrinterTurbo

MoneyPrinterTurbo: open-source tool generating high-definition short videos with one click using AI LLMs. Automates video content creation.

Video generation Open source Tools

SIG

HYP

GitHub Trending·May 27

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> harry0703 /</span> MoneyPrinterTurbo

MoneyPrinterTurbo: open-source tool generating HD short videos with one click using AI LLMs. Automates video content creation.

Video generation Open source Tools

SIG

HYP

arXiv cs.LG·May 27

Quantized Keys Steal Attention: Bias Correction for KV-Cache Compression in Video Diffusion

Autoregressive video diffusion models use quantized KV caches to reduce memory, but quantization creates an attention bias (Jensen bias) that degrades quality. Authors propose a per-attention-score correction computed from quantization step sizes, recovering quality lost with INT2 quantization while using 50% less memory than INT4.

Video generation Reasoning Benchmarks

SIG

HYP

arXiv cs.AI·May 27

Tail-Aware HiFloat4: W4A4 Post-Training Quantization for Wan2.2

Tail-Aware HiFloat4 applies W4A4 post-training quantization to the Wan2.2 text-to-video generation model. The method adapts ViDiT-Q using HiFloat4 format, quantizes transformer linear layers, preserves numerically sensitive modules in high precision, and introduces activation-tail-aware percentile calibration to reduce impact of rare outliers.

Video generation Fine-tuning Benchmarks

SIG

HYP

Hacker News (AI)·May 26

Show HN: We made a cinematic heist trailer with 4 AI models for $60

Creators produced a cinematic heist movie trailer by combining 4 AI models for $60. Demonstrates feasibility of low-cost AI video production.

Video generation Tools

SIG

HYP

Hacker News (AI)·May 23

Cannes Film Cost $500k to Make. $400k Was AI Compute Costs

A short film presented at Cannes cost $500k to produce, with $400k spent on AI compute. The ratio reveals the growing share of infrastructure costs in video generation and creative content production.

Video generation Business Infrastructure

SIG

HYP

Reddit r/LocalLLaMA·May 23

meituan-longcat/LongCat-Video-Avatar-1.5 · Hugging Face

Meituan releases LongCat-Video-Avatar 1.5, an open-source framework for audio-driven human avatar video generation. Upgrades audio encoder from Wav2Vec2 to Whisper-Large, supports Audio-Text-to-Video and Video Continuation with 8-step inference. Human evaluation on 508 image-audio pairs across 6 scenarios and 2 languages.

Video generation Open source Benchmarks

SIG

HYP

Replicate Blog·May 21

How to prompt Grok Imagine Video 1.5

xAI releases Grok Imagine Video 1.5, which animates still images into short clips with synchronized audio in a single pass. Guide on how to maximize output quality.

Video generation Tools

SIG

HYP

arXiv cs.AI·May 20

PRISM: A Benchmark for Programmatic Spatial-Temporal Reasoning

PRISM is a 10,372 instruction-code pair benchmark for evaluating programmatic video generation by LLMs. It proposes 4 metrics: code reliability, spatial coherence, visual complexity, and temporal density. Evaluation of 7 LLMs reveals a 41% execution-spatial gap: executable code does not guarantee spatially coherent output.

Benchmarks Code generation Video generation

SIG

HYP

Le Big Data·May 19

Gemini Omni : l’IA vidéo de Google maîtrise enfin la physique et les personnages constants

Google unveils Gemini Omni at its I/O 2026 conference, a video AI capable of mastering physics and maintaining character consistency in video generation.

Gemini Video generation

SIG

HYP

Reddit r/LocalLLaMA·May 19

bytedance released an open source model that attempts to do just about anything with only 3b parameters

ByteDance releases Lance, an open-source multimodal model with 3B active parameters. Supports image/video generation and editing in a single framework. Trained from scratch on 128 A100-GPU budget.

Open source Image generation Video generation

SIG

HYP

GitHub Trending·May 19

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> heygen-com /</span> hyperframes

Hyperframes is a framework enabling AI agents to generate video content through HTML. Tool designed to automate video creation in agent workflows.

AI Agents Video generation Tools

SIG

HYP

GitHub Trending·May 19

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> HKUDS /</span> ViMax

ViMax is an agentic video generation system integrating director, screenwriter, producer, and video generator roles. The GitHub project presents a multi-agent architecture for end-to-end video creation orchestration.

AI Agents Multi-agent Video generation

SIG

HYP

arXiv cs.AI·May 19

Focused Forcing: Content-Aware Per-Frame KV Selection for Efficient Autoregressive Video Diffusion

Focused Forcing optimizes KV caches in autoregressive video diffusion generation by selecting relevant historical frames per-frame and per-head. The method combines attention scores with diversity scores, achieving 1.48× end-to-end acceleration without training while improving visual quality and text alignment.

Video generation Reasoning Evals

SIG

HYP

arXiv cs.AI·May 19

ANVIL: Analogies and Videos for Lecturers

ANVIL is a multimodal generative system automating production of analogy-based instructional animations for computer science. Given a concept definition, it generates textual analogies, compiles them into structured visual screenplays, and produces executable manim code. Evaluation includes teacher studies and user adoption assessment.

Video generation Code generation Evals

SIG

HYP