Research2026-05-14
Persona-Model Collapse in Emergent Misalignment
Source: Arxiv CS.AI
arXiv:2605.12850v1 Announce Type: cross Abstract: Fine-tuning large language models on narrow data with harmful content produces broadly misaligned behavior on unrelated prompts, a phenomenon known as emergent misalignment. We propose that emergent misalignment involves persona-model collapse:...
arxivpapers