HIGHAI/LLMexploited in the wild

AI-MEMORY-POISONING-2024

LLM security · Agent memory poisoning

Summary

Agent memory poisoning is a persistent prompt-injection class where attacker instructions delivered through untrusted content are written into an assistant's long-term memory, so the directive survives across future independent sessions. The low-level mechanism abuses the model's memory tool: indirect injection (for example a malicious web page or document the model summarizes) causes the agent to invoke its memory function and store an attacker-controlled instruction, which is then re-loaded into every subsequent conversation's context. Johann Rehberger demonstrated this as 'SpAIware' on September 20, 2024 against the ChatGPT macOS app, chaining memory injection with an image-rendering exfiltration channel that bypassed the url_safe mitigation to continuously leak conversations to an attacker server; he showed the same delayed-tool-invocation memory poisoning against Google Gemini in February 2025. The class maps to OWASP LLM01:2025 Prompt Injection and improper output/memory handling.

How to avoid it in your code

Treat memory writes as sensitive actions requiring explicit user confirmation before persisting.
Show, log and let users review or delete every stored memory entry.
Isolate untrusted retrieved content from instruction-execution context during summarization.
Block outbound image/URL rendering to non-allowlisted domains to cut exfiltration channels.
Apply content classifiers to detect injection and delayed-trigger patterns in inputs.

References

Related vulnerabilities

All AI/LLM →