Research2026-05-06

Probe-Geometry Alignment: Erasing the Cross-Sequence Memorization Signature Below Chance

arXiv:2605.01699v1 Announce Type: cross Abstract: Recent attacks show that behavioural unlearning of large language models leaves internal traces recoverable by adversarial probes. We characterise where this retention lives and show it can be surgically removed without measurable capability cost....

Read Original Article on Arxiv CS.AI

arxivpapers