BeClaude
Research2026-05-05

InfantAgent-Next: A Multimodal Generalist Agent for Automated Computer Interaction

Source: Arxiv CS.AI

arXiv:2505.10887v3 Announce Type: replace Abstract: This paper introduces \textsc{InfantAgent-Next}, a generalist agent capable of interacting with computers in a multimodal manner, encompassing text, images, audio, and video. Unlike existing approaches that either build intricate workflows around...

arxivpapersagentsmultimodal