Research2026-05-12

Unlearners Can Lie: Evaluating and Improving Honesty in LLM Unlearning

arXiv:2605.08765v1 Announce Type: cross Abstract: Unlearning in large language models (LLMs) aims to remove harmful training data while preserving overall utility. However, we find that existing methods often hallucinate, generate abnormal token sequences, or behave inconsistently, raising safety...

Read Original Article on Arxiv CS.AI

arxivpapers