BeClaude
Research2026-05-13

Beyond the Last Layer: Multi-Layer Representation Fusion for Visual Tokenization

Source: Arxiv CS.AI

arXiv:2605.10780v2 Announce Type: cross Abstract: Representation autoencoders that reuse frozen pretrained vision encoders as visual tokenizers have achieved strong reconstruction and generation quality. However, existing methods universally extract features from only the last encoder layer,...

arxivpapers