Research2026-05-12
When Agents Overtrust Environmental Evidence: An Extensible Agentic Framework for Benchmarking Evidence-Grounding Defects in LLM Agents
Source: Arxiv CS.AI
arXiv:2605.08828v1 Announce Type: new Abstract: Large language model agents increasingly operate through environment-facing scaffolds that expose files, web pages, APIs, and logs. These observations influence tool use, state tracking, and action sequencing, yet their reliability and authority are...
arxivpapersagentsbenchmark