Research2026-05-12
LLM-Agnostic Semantic Representation Attack
Source: Arxiv CS.AI
arXiv:2605.08898v1 Announce Type: cross Abstract: Large Language Models (LLMs) increasingly employ alignment techniques to prevent harmful outputs. Despite these safeguards, attackers can circumvent them by crafting adversarial prompts. Predominant token-level optimization methods primarily rely on...
arxivpapers