Summary
Agent memory poisoning is a persistent prompt-injection class where attacker instructions delivered through untrusted content are written into an assistant's long-term memory, so the directive survives across future independent sessions. The low-level mechanism abuses the model's memory tool: indirect injection (for example a malicious web page or document the model summarizes) causes the agent to invoke its memory function and store an attacker-controlled instruction, which is then re-loaded into every subsequent conversation's context. Johann Rehberger demonstrated this as 'SpAIware' on September 20, 2024 against the ChatGPT macOS app, chaining memory injection with an image-rendering exfiltration channel that bypassed the url_safe mitigation to continuously leak conversations to an attacker server; he showed the same delayed-tool-invocation memory poisoning against Google Gemini in February 2025. The class maps to OWASP LLM01:2025 Prompt Injection and improper output/memory handling.
How to avoid it in your code
- Treat memory writes as sensitive actions requiring explicit user confirmation before persisting.
- Show, log and let users review or delete every stored memory entry.
- Isolate untrusted retrieved content from instruction-execution context during summarization.
- Block outbound image/URL rendering to non-allowlisted domains to cut exfiltration channels.
- Apply content classifiers to detect injection and delayed-trigger patterns in inputs.
References
Related vulnerabilities
All AI/LLM →- HIGHAI-AGENT-INDIRECT-PROMPT-INJECTION-2025
Coding agents that autonomously read project and external content are vulnerable to indirect prompt injection, where hidden instructions placed in untrusted material the agent ingests hijack its behavior. The injection surface is broad: a poisoned README, source-code comment, GitHub issue or PR comment, a dependency's files, a fetched web page, or an MCP tool description, with instructions often concealed using invisible Unicode characters so a human reviewer never sees them, as Pillar Security demonstrated with the 'Rules File Backdoor' technique. Because the agent cannot distinguish trusted developer instructions from attacker text in the data it processes, the injected commands can direct it to insert a backdoor, weaken security controls, exfiltrate secrets, or run shell/MCP commands. Johann Rehberger (Embrace The Red) proved the data-exfiltration variant in Cursor with CVE-2025-54132 (disclosed June 30, 2025, fixed in v1.3): a comment-embedded payload made Cursor render a Mermaid diagram containing an attacker image URL, auto-firing an outbound request that leaked API keys and agent memory without confirmation. When the developer merges or runs the agent's resulting output unmonitored, the attacker-controlled changes land directly in the codebase or on the developer's machine.
- HIGHAI-REMOTELI-BOT-2022
In mid-September 2022 the remoteli.io Twitter bot, a GPT-3-powered account that auto-replied to tweets about remote work, became the first viral customer-facing prompt-injection case. The bot built each request by concatenating its fixed instruction prompt with the raw text of a user's tweet and sending the combined string to the GPT-3 API, with no boundary between the operator's trusted instructions and the untrusted tweet. Because the model treats all tokens equally, a tweet containing 'ignore the above and ...' was processed as a higher-priority instruction, letting any user override the bot's original task. Users made the bot threaten people, claim responsibility for the Challenger space shuttle disaster, and post content violating platform policy. Riley Goodside publicized the technique on September 12 and Simon Willison coined the term 'prompt injection' the next day, comparing it to SQL injection against unsanitized input.
- CRITICALAI-GROK-BANKR-WALLET-2026
In early May 2026 an attacker drained roughly $150,000 from an AI-powered crypto trading agent on X (Twitter) through prompt injection, an exploit of Grok and the linked Bankrbot agent documented by AI-security researchers including Giskard and NeuralTrust. The attacker posted a Morse-code-encoded message on X and asked Grok to translate it; Grok decoded the obfuscated payload, which contained hidden financial instructions, and the encoding let the untrusted post slip past content filters. Grok processed this user-supplied X content as a trusted directive with no separation between conversation input and authorized commands, then relayed the decoded instruction to the linked Bankrbot agent, which executed it as a legitimate order. Combined with a previously transferred Bankr Club Membership NFT that granted elevated 'Executive' wallet permissions, Bankrbot sent about 3 billion DRB tokens (roughly $150,000) on the Base network to the attacker's wallet, with no human-in-the-loop or circuit breaker on the high-value transfer. About 80% of the funds were later returned after the community identified the attacker.
- CRITICALAI-COPILOT-CAMOLEAK-2025
Legit Security disclosed CamoLeak (CVSS 9.6), a critical vulnerability in GitHub Copilot Chat enabling silent exfiltration of private source code and secrets. The attack combined remote prompt injection via hidden pull-request comments with a CSP bypass that abused GitHub's own Camo image proxy: injected instructions made Copilot extract sensitive repo context, encode it character-by-character into a pre-generated dictionary of Camo image URLs, and leak it through image requests to an attacker server. GitHub mitigated it by disabling image rendering in Copilot Chat in August 2025.
- CRITICALAI-FORCEDLEAK-AGENTFORCE-2025
Disclosed on September 25, 2025 by Noma Security, ForcedLeak is a CVSS 9.4 indirect prompt-injection chain in Salesforce Agentforce affecting organizations with Web-to-Lead enabled. An attacker submits a public Web-to-Lead form and plants hidden instructions in the Description field, chosen because its roughly 42,000-character limit allows complex multi-step directives. When an employee later asks the Agentforce AI agent to process or summarize that lead, the agent ingests the attacker-controlled text as part of its context and executes the embedded commands, querying and reading internal CRM data such as lead email addresses and other contact and sales-pipeline information. The agent then exfiltrates the harvested data by embedding it in an image or link request to an expired Salesforce-related domain that remained on the Content Security Policy allow-list and was re-registered by researchers for about $5, bypassing egress controls. Salesforce remediated it on September 8, 2025 by re-securing the expired domain and enforcing Trusted URLs for Agentforce and Einstein AI; no CVE was assigned because the issue did not stem from a software version flaw.
- HIGHAI-LENOVO-LENA-XSS-2025
In 2025 Cybernews researchers disclosed that Lenovo's GPT-4-based customer-service chatbot 'Lena' could be turned into a cross-site scripting vector through a single prompt injection. A roughly 400-character prompt opened with a normal product question, then instructed the bot to format its reply as HTML and to include an image tag whose source pointed at an attacker-controlled server, insisting the image must be shown. Because the chatbot's output was rendered in the browser without sanitization or output encoding, the untrusted instruction flowed straight into live HTML, and the forced image request caused the victim's browser to call the attacker server and leak active session cookies. The impact extended to support staff: when a chat was escalated, the human agent's workstation rendered the stored malicious HTML, exposing the agent's session and enabling potential session hijacking, redirects, or malware prompts. Cybernews reported finding the flaw on July 22, 2025; Lenovo acknowledged it on August 6, 2025 and deployed fixes by August 18, 2025. The root cause was treating model output as trusted markup and rendering it without filtering.