Research2026-05-05

ExCyTIn-Bench: Evaluating LLM agents on Cyber Threat Investigation

arXiv:2507.14201v3 Announce Type: replace-cross Abstract: We present ExCyTIn-Bench, the first benchmark to Evaluate an LLM agent X on the task of Cyber Threat Investigation through security questions derived from investigation graphs. Real-world security analysts must sift through a large number of...

Read Original Article on Arxiv CS.AI

arxivpapersagents