Research2026-05-12
Consistency as a Testable Property: Statistical Methods to Evaluate AI Agent Reliability
Source: Arxiv CS.AI
arXiv:2605.10516v1 Announce Type: new Abstract: This paper establishes a rigorous measurement science for AI agent reliability, providing a foundational framework for quantifying consistency under semantically preserving perturbations. By leveraging $U$-statistics for output-level reliability and...
arxivpapersagents