Research2026-05-07
Can Explicit Physical Feasibility Benefit VLA Learning? An Empirical Study
Source: Arxiv CS.AI
arXiv:2604.17896v2 Announce Type: replace-cross Abstract: Vision-Language-Action (VLA) models map multimodal inputs directly to robot actions and are typically trained through large-scale imitation learning. While this paradigm has shown strong performance, prevailing VLA training procedures do not...
arxivpapers