Research2026-05-14

Towards Long-horizon Embodied Agents with Tool-Aligned Vision-Language-Action Models

arXiv:2605.13119v1 Announce Type: cross Abstract: Vision-language-action (VLA) models are effective robot action executors, but they remain limited on long-horizon tasks due to the dual burden of extended closed-loop planning and diverse physical operations. We therefore propose VLAs-as-Tools, a...

Read Original Article on Arxiv CS.AI

arxivpapersagentsvision