Research2026-05-14
Towards Long-horizon Embodied Agents with Tool-Aligned Vision-Language-Action Models
Source: Arxiv CS.AI
arXiv:2605.13119v1 Announce Type: cross Abstract: Vision-language-action (VLA) models are effective robot action executors, but they remain limited on long-horizon tasks due to the dual burden of extended closed-loop planning and diverse physical operations. We therefore propose VLAs-as-Tools, a...
arxivpapersagentsvision