BeClaude
Policy2026-05-11

One Token Per Frame: Reconsidering Visual Bandwidth in World Models for VLA Policy

Source: Arxiv CS.AI

arXiv:2605.07931v1 Announce Type: cross Abstract: Vision-language-action (VLA) models increasingly rely on auxiliary world modules to plan over long horizons, yet how such modules should be parameterized on top of a pretrained VLA remains an open design question. Existing world-model-augmented VLAs...

arxivpapers