GROW: Aligning GRPO with State-Action Modeling for Open-World VLM Agents
Signal
78
Hype
25
In three linesGROW adapts GRPO (Group Relative Policy Optimization) for VLM agents by decomposing trajectories into state-action samples to avoid excessively long contexts. Tested on 800+ Minecraft tasks, the method achieves SOTA in multi-turn RL for open-world agents.Read source
Your take?
Summary generated by Claude — human-verified