BeClaude
Research2026-04-22

Cyber Defense Benchmark: Agentic Threat Hunting Evaluation for LLMs in SecOps

Source: Arxiv CS.AI

arXiv:2604.19533v1 Announce Type: cross Abstract: We introduce the Cyber Defense Benchmark, a benchmark for measuring how well large language model (LLM) agents perform the core SOC analyst task of threat hunting: given a database of raw Windows event logs with no guided questions or hints,...

arxivpapersagentsbenchmark