Research2026-04-27
The Shape of Adversarial Influence: Characterizing LLM Latent Spaces with Persistent Homology
Source: Arxiv CS.AI
arXiv:2505.20435v3 Announce Type: replace-cross Abstract: Existing interpretability methods for Large Language Models (LLMs) predominantly capture linear directions or isolated features. This overlooks the high-dimensional, relational, and nonlinear geometry of model representations. We apply...
arxivpapers