Research2026-04-28
Evaluation of Prompt Injection Defenses in Large Language Models
Source: Arxiv CS.AI
arXiv:2604.23887v1 Announce Type: cross Abstract: LLM-powered applications routinely embed secrets in system prompts, yet models can be tricked into revealing them. We built an adaptive attacker that evolves its strategies over hundreds of rounds and tested it against nine defense configurations...
arxivpapersprompting