Back to feed
arXiv cs.AI·

Improved Baselines with Representation Autoencoders

Signal
78
Hype
15
In three linesRepresentation Autoencoders v2 improves VAEs using pretrained vision encoders. Authors find that summing the last k encoder layers, combining RAE with REPA (representation alignment), and re-parameterizing classifier-free guidance accelerates convergence 10x. RAEv2 achieves gFID 1.06 in 80 epochs on ImageNet-256 and EP_FID@2 of 35 epochs.
Read source
Your take?
VisionImage generationBenchmarksPapers

Summary generated by Claude — human-verified