arXiv cs.AI·19 May 2026

Identifiable Token Correspondence for World Models

Signal

Hype

In three linesTransformer-based world model for video frame prediction. Formulates next-frame prediction as structured probabilistic inference with latent token correspondence variables. Each token either copies from previous frame or generates new. SOTA on 4 benchmarks: 72.5% return and 35.6% score on Craftax-classic (vs 67.4%/27.9% prior).

Read source

Your take?

Reasoning Vision Reinforcement learning Papers Benchmarks

Summary generated by Claude — human-verified

Identifiable Token Correspondence for World Models

Other angles on this story