arXiv cs.AI·19 May 2026

Improved Baselines with Representation Autoencoders

Signal

Hype

In three linesRepresentation Autoencoders v2 improves VAEs using pretrained vision encoders. Authors find that summing the last k encoder layers, combining RAE with REPA (representation alignment), and re-parameterizing classifier-free guidance accelerates convergence 10x. RAEv2 achieves gFID 1.06 in 80 epochs on ImageNet-256 and EP_FID@2 of 35 epochs.

Read source

Your take?

Vision Image generation Benchmarks Papers

Summary generated by Claude — human-verified

Improved Baselines with Representation Autoencoders

Other angles on this story