Research2026-05-12
Unlearners Can Lie: Evaluating and Improving Honesty in LLM Unlearning
Source: Arxiv CS.AI
arXiv:2605.08765v1 Announce Type: cross Abstract: Unlearning in large language models (LLMs) aims to remove harmful training data while preserving overall utility. However, we find that existing methods often hallucinate, generate abnormal token sequences, or behave inconsistently, raising safety...
arxivpapers