Research2026-04-30
Safety Is Not Universal: The Selective Safety Trap in LLM Alignment
Source: Arxiv CS.AI
arXiv:2601.04389v2 Announce Type: replace-cross Abstract: Current safety evaluations of large language models (LLMs) create a dangerous illusion of universal protection by aggregating harms under generic categories such as "Identity Hate", obscuring vulnerabilities toward specific populations. In...
arxivpaperssafety