Back to feed
arXiv cs.LG·

GROW: Aligning GRPO with State-Action Modeling for Open-World VLM Agents

Signal
78
Hype
25
In three linesGROW adapts GRPO (Group Relative Policy Optimization) for VLM agents by decomposing trajectories into state-action samples to avoid excessively long contexts. Tested on 800+ Minecraft tasks, the method achieves SOTA in multi-turn RL for open-world agents.
Read source
Your take?
Reinforcement learningVisionAI AgentsPapers

Summary generated by Claude — human-verified