Research2026-05-12

Beyond the Last Layer: Multi-Layer Representation Fusion for Visual Tokenizatio

arXiv:2605.10780v1 Announce Type: cross Abstract: Representation autoencoders that reuse frozen pretrained vision encoders as visual tokenizers have achieved strong reconstruction and generation quality. However, existing methods universally extract features from only the last encoder layer,...

Read Original Article on Arxiv CS.AI

arxivpapers