Back to News
Research2026-04-17
Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games
Source: Arxiv CS.AI
arXiv:2506.03610v3 Announce Type: replace Abstract: Large Language Model (LLM) agents are reshaping the game industry, by enabling more intelligent and human-preferable characters. Yet, current game benchmarks fall short of practical needs: they lack evaluations of diverse LLM capabilities across...
arxivpapersagentsbenchmark