BeClaude
Research2026-05-08

When No Benchmark Exists: Validating Comparative LLM Safety Scoring Without Ground-Truth Labels

Source: Arxiv CS.AI

arXiv:2605.06652v1 Announce Type: cross Abstract: Many deployments must compare candidate language models for safety before a labeled benchmark exists for the relevant language, sector, or regulatory regime. We formalize this setting as benchmarkless comparative safety scoring and specify the...

arxivpapersbenchmarksafety