Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning
VideoDR is the first benchmark for open-domain video question answering, combining cross-frame visual extraction, iterative web retrieval, and multi-hop reasoning. Evaluation of multimodal models (closed/open-source) shows Agentic paradigm is not consistently superior to Workflow; key challenges are goal drift and long-horizon consistency.