arXiv cs.AI·19 May 2026

Structured Labeling Enables Faster Vision-Language Models for End-to-End Autonomous Driving

Signal

Hype

In three linesFastDrive, a compact 0.9B-parameter VLM, outperforms 7B+ models (LLaVA-1.5) on autonomous driving tasks. Trained on NuScenes-S, a benchmark with structured representations, it achieves +20% accuracy on decision-making with 10x inference speedup.

Read source

Your take?

Vision Reasoning Benchmarks Code generation Robotics

Summary generated by Claude — human-verified

Structured Labeling Enables Faster Vision-Language Models for End-to-End Autonomous Driving

Other angles on this story