BeClaude
Research2026-05-05

ExCyTIn-Bench: Evaluating LLM agents on Cyber Threat Investigation

Source: Arxiv CS.AI

arXiv:2507.14201v3 Announce Type: replace-cross Abstract: We present ExCyTIn-Bench, the first benchmark to Evaluate an LLM agent X on the task of Cyber Threat Investigation through security questions derived from investigation graphs. Real-world security analysts must sift through a large number of...

arxivpapersagents