Research2026-04-24
SafeRedirect: Defeating Internal Safety Collapse via Task-Completion Redirection in Frontier LLMs
Source: Arxiv CS.AI
arXiv:2604.20930v1 Announce Type: cross Abstract: Internal Safety Collapse (ISC) is a failure mode in which frontier LLMs, when executing legitimate professional tasks whose correct completion structurally requires harmful content, spontaneously generate that content with safety failure rates...
arxivpaperssafety