Research2026-04-30

Safety Is Not Universal: The Selective Safety Trap in LLM Alignment

arXiv:2601.04389v2 Announce Type: replace-cross Abstract: Current safety evaluations of large language models (LLMs) create a dangerous illusion of universal protection by aggregating harms under generic categories such as "Identity Hate", obscuring vulnerabilities toward specific populations. In...

Read Original Article on Arxiv CS.AI

arxivpaperssafety