Research2026-06-30

Memory as an Attack Surface in LLM Agents: A Study on Multiple-Choice Question Answering

Originally published byArxiv CS.AI

arXiv:2606.29030v1 Announce Type: new Abstract: AI agents extend conventional large language model (LLM) applications by integrating language understanding with task execution, external tool use, and memory mechanisms. While memory allows agents to retain prior interactions and provide more...

Memory as a New Attack Vector

A new preprint from arXiv (2606.29030v1) has identified a previously underappreciated vulnerability in LLM-based agents: the memory system itself. The researchers demonstrate that an agent’s stored context—designed to enable coherent multi-turn interactions—can be exploited to manipulate downstream task performance, specifically in multiple-choice question answering scenarios. By injecting carefully crafted adversarial content into the memory buffer, attackers can steer an agent toward incorrect answers or override its factual reasoning.

This is not a traditional prompt injection attack targeting the model’s input window. Instead, it exploits the persistence layer: the memory that accumulates across sessions or turns. The study shows that even when the agent receives a clean, benign query, corrupted memory entries can poison its reasoning process. The attack surface expands because memory is often treated as a trusted, static repository rather than a dynamic, potentially hostile channel.

Why This Matters

The finding has significant implications for the growing ecosystem of AI agents. As developers rush to equip LLMs with memory—for personal assistants, customer service bots, coding copilots, and research tools—they are inadvertently creating a new class of security vulnerabilities. Memory poisoning differs from data poisoning in training sets; it operates at inference time and can be executed by any user who interacts with the agent over multiple sessions.

For enterprise deployments, this is particularly concerning. An agent that retains user preferences, project context, or historical decisions could be manipulated by a single malicious interaction. The attacker does not need access to the model weights, training data, or infrastructure—only the ability to send messages that get stored in the agent’s memory. This lowers the barrier to entry for adversarial attacks dramatically.

The research also highlights a gap in current safety evaluations. Most red-teaming efforts focus on single-turn prompt injection, jailbreaking, or output filtering. Memory-based attacks require a different evaluation methodology: one that tests the agent’s resilience to cumulative, cross-turn contamination. Without such testing, agents may appear safe in isolated interactions while remaining vulnerable to long-term manipulation.

Implications for AI Practitioners

Developers building agentic systems should immediately audit their memory architectures. Key questions include: Is memory content validated before being stored? Are there mechanisms to detect anomalous or contradictory entries? Can the agent distinguish between trusted long-term memory and untrusted session-specific context?

One practical mitigation is to implement memory isolation—separating user-provided content from system-generated or verified facts. Another is to apply input sanitization and anomaly detection to memory writes, similar to how databases guard against SQL injection. Additionally, agents could be designed to periodically re-verify critical facts against a trusted knowledge base, rather than blindly trusting stored context.

The research also suggests that memory should be treated as a first-class security concern in agent frameworks like LangChain, AutoGPT, and custom implementations. Current best practices around prompt engineering and output filtering are insufficient if the memory channel remains unprotected.

Key Takeaways

Memory systems in LLM agents represent a new, exploitable attack surface that can be poisoned across multiple interactions to corrupt downstream reasoning.
This vulnerability is distinct from prompt injection and data poisoning, requiring novel detection and mitigation strategies focused on memory validation and isolation.
Enterprise deployments should audit memory architectures immediately, implementing write-time sanitization and cross-referencing against trusted sources.
Safety evaluation frameworks must expand to include cross-turn, memory-based attacks rather than focusing solely on single-turn adversarial inputs.

Read Original Article on Arxiv CS.AI

arxivpapersagents