BeClaude
Research2026-05-12

Beyond the Last Layer: Multi-Layer Representation Fusion for Visual Tokenizatio

Source: Arxiv CS.AI

arXiv:2605.10780v1 Announce Type: cross Abstract: Representation autoencoders that reuse frozen pretrained vision encoders as visual tokenizers have achieved strong reconstruction and generation quality. However, existing methods universally extract features from only the last encoder layer,...

arxivpapers