BeClaude
Research2026-04-28

Failure-Centered Runtime Evaluation for Deployed Trilingual Public-Space Agents

Source: Arxiv CS.AI

arXiv:2604.23990v1 Announce Type: new Abstract: This paper presents PSA-Eval, a failure-centered runtime evaluation framework for deployed trilingual public-space agents. The central claim is that, when the evaluation object shifts from a static input-output mapping to a runtime system, the basic...

arxivpapersagents