Identifiable Token Correspondence for World Models
Signal
78
Hype
25
In three linesTransformer-based world model for video frame prediction. Formulates next-frame prediction as structured probabilistic inference with latent token correspondence variables. Each token either copies from previous frame or generates new. SOTA on 4 benchmarks: 72.5% return and 35.6% score on Craftax-classic (vs 67.4%/27.9% prior).Read source
Your take?
Summary generated by Claude — human-verified