Improved Baselines with Representation Autoencoders
Signal
78
Hype
15
In three linesRepresentation Autoencoders v2 improves VAEs using pretrained vision encoders. Authors find that summing the last k encoder layers, combining RAE with REPA (representation alignment), and re-parameterizing classifier-free guidance accelerates convergence 10x. RAEv2 achieves gFID 1.06 in 80 epochs on ImageNet-256 and EP_FID@2 of 35 epochs.Read source
Your take?
Summary generated by Claude — human-verified