Research2026-05-14

Breaking Down and Building Up: Mixture of Skill-Based Vision-and-Language Navigation Agents

arXiv:2508.07642v4 Announce Type: replace Abstract: Vision-and-Language Navigation (VLN) poses significant challenges for agents to interpret natural language instructions and navigate complex 3D environments. While recent progress has been driven by large-scale pre-training and data augmentation,...

Read Original Article on Arxiv CS.AI

arxivpapersagentsvision