arXiv cs.AI·19 May 2026

DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving

Signal

Hype

In three linesDriveMoE proposes a Mixture-of-Experts architecture for end-to-end autonomous driving. The model combines Vision MoE (dynamic camera selection based on driving context) and Action MoE (specialized expert activation for different behaviors). Built on Drive-π₀ baseline, DriveMoE achieves SOTA on Bench2Drive by avoiding mode averaging.

Read source

Your take?

Vision AI Agents Papers Benchmarks Reasoning

Summary generated by Claude — human-verified

DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving

Other angles on this story