arXiv cs.AI·27 May 2026

Advancing Creative Physical Intelligence in Large Multimodal Models

Signal

Hype

In three linesMM-CreativityBench, a new benchmark, evaluates large multimodal models' ability to solve creative problems by identifying non-obvious object uses in physically constrained environments. Current LMMs fail due to insufficient grounded exploration and hallucinations. Affordance-grounded alignment via Direct Preference Optimization reduces these errors and improves entity selection.

Read source

Your take?

Benchmarks Vision Reasoning Alignment

Summary generated by Claude — human-verified

Advancing Creative Physical Intelligence in Large Multimodal Models

Other angles on this story