Research2026-04-28
Failure-Centered Runtime Evaluation for Deployed Trilingual Public-Space Agents
Source: Arxiv CS.AI
arXiv:2604.23990v1 Announce Type: new Abstract: This paper presents PSA-Eval, a failure-centered runtime evaluation framework for deployed trilingual public-space agents. The central claim is that, when the evaluation object shifts from a static input-output mapping to a runtime system, the basic...
arxivpapersagents