Research2026-04-27

The Shape of Adversarial Influence: Characterizing LLM Latent Spaces with Persistent Homology

arXiv:2505.20435v3 Announce Type: replace-cross Abstract: Existing interpretability methods for Large Language Models (LLMs) predominantly capture linear directions or isolated features. This overlooks the high-dimensional, relational, and nonlinear geometry of model representations. We apply...

Read Original Article on Arxiv CS.AI

arxivpapers