Research2026-04-17

HiVLA: A Visual-Grounded-Centric Hierarchical Embodied Manipulation System

arXiv:2604.14125v1 Announce Type: cross Abstract: While end-to-end Vision-Language-Action (VLA) models offer a promising paradigm for robotic manipulation, fine-tuning them on narrow control data often compromises the profound reasoning capabilities inherited from their base Vision-Language Models...

Read Original Article on Arxiv CS.AI

arxivpapers