Research2026-05-05
InfantAgent-Next: A Multimodal Generalist Agent for Automated Computer Interaction
Source: Arxiv CS.AI
arXiv:2505.10887v3 Announce Type: replace Abstract: This paper introduces \textsc{InfantAgent-Next}, a generalist agent capable of interacting with computers in a multimodal manner, encompassing text, images, audio, and video. Unlike existing approaches that either build intricate workflows around...
arxivpapersagentsmultimodal