Research2026-04-22
Cyber Defense Benchmark: Agentic Threat Hunting Evaluation for LLMs in SecOps
Source: Arxiv CS.AI
arXiv:2604.19533v1 Announce Type: cross Abstract: We introduce the Cyber Defense Benchmark, a benchmark for measuring how well large language model (LLM) agents perform the core SOC analyst task of threat hunting: given a database of raw Windows event logs with no guided questions or hints,...
arxivpapersagentsbenchmark