Nvidia LocateAnything - Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding. (10x faster than Qwen3-VL)
Signal
75
Hype
35
In three linesNvidia releases LocateAnything, a 3B vision-language grounding model. Uses parallel box decoding, 10x faster than Qwen3-VL. Code and demo available on HuggingFace.Read source
Your take?
Summary generated by Claude — human-verified