BeClaude
Research2026-05-07

What You Think is What You See: Driving Exploration in VLM Agents via Visual-Linguistic Curiosity

Source: Arxiv CS.AI

arXiv:2605.03782v1 Announce Type: new Abstract: To navigate partially observable visual environments, recent VLM agents increasingly internalize world modeling capabilities into their policies via explicit CoT reasoning, enabling them to mentally simulate futures before acting. However, relying...

arxivpapersagents