BeClaude
Research2026-05-12

When Agents Overtrust Environmental Evidence: An Extensible Agentic Framework for Benchmarking Evidence-Grounding Defects in LLM Agents

Source: Arxiv CS.AI

arXiv:2605.08828v1 Announce Type: new Abstract: Large language model agents increasingly operate through environment-facing scaffolds that expose files, web pages, APIs, and logs. These observations influence tool use, state tracking, and action sequencing, yet their reliability and authority are...

arxivpapersagentsbenchmark