AI-DATA-MODEL-POISONING-2023
LLM security · Training-data / RAG poisoning
Summary
Training-data and RAG poisoning is a class in which an attacker injects malicious or backdoored data into a model's pre-training set, fine-tuning corpus or retrieval-augmented-generation knowledge base so the model emits attacker-chosen outputs, often gated behind a specific trigger. The mechanism can be surgical: Mithril Security's PoisonGPT (July 9, 2023) used Rank-One Model Editing (ROME) to overwrite a single factual association in GPT-J-6B so it asserted Yuri Gagarin was the first man on the Moon, while remaining within roughly 0.1% of the original model's benchmark accuracy and thus undetectable by standard evaluation. They distributed it on Hugging Face under the typosquatted name 'EleuterAI' to mimic the legitimate EleutherAI lab, illustrating the supply-chain reach; analogous RAG poisoning seeds malicious documents into a vector store so retrieval injects them at query time. The class maps to OWASP LLM04:2025 Data and Model Poisoning.
How to avoid it in your code
- Verify model and dataset provenance via signing, checksums or attestation before use.
- Source models from trusted publishers and guard against typosquatted repository names.
- Vet, sanitize and access-control documents ingested into RAG knowledge bases.
- Track data lineage and use anomaly detection on training and fine-tuning corpora.
- Red-team models with trigger and backdoor probes beyond standard accuracy benchmarks.
References
Related vulnerabilities
All AI/LLM →- HIGHAI-SHADOWLEAK-2025
ShadowLeak is a server-side zero-click indirect prompt-injection attack against ChatGPT's Deep Research agent, discovered by Radware. An attacker emails the victim a message with instructions hidden in the HTML using white-on-white text and tiny fonts; when the user runs Deep Research over their inbox, the agent autonomously follows the hidden instructions and exfiltrates personal and inbox data. The distinguishing trait is that exfiltration occurs entirely server-side within OpenAI's cloud infrastructure, making it invisible to local and enterprise network defenses. The Gmail proof of concept generalizes to any Deep Research connector; OpenAI fixed it before public disclosure with no evidence of in-the-wild exploitation.
- MEDIUMAI-GEMINI-WORKSPACE-2025
Marco Figueroa of Mozilla's 0DIN program documented a Gemini for Workspace flaw where an attacker hides instructions inside an email using tags styled with font-size zero or white-on-white text, invisible to the recipient. When the user clicks Summarize this email, Gemini processes the raw HTML and treats the hidden directive as a high-priority instruction, appending an attacker-crafted fake security warning, such as a fake support phone number, that appears to come from Google. No links or attachments are required, enabling credential harvesting and vishing at scale through indirect prompt injection.
- HIGHAI-SLACK-PROMPT-INJECTION-2024
PromptArmor disclosed an indirect prompt-injection data-exfiltration flaw in Slack AI. An attacker with only the ability to post in a public channel plants adversarial instructions; when any Slack AI user later queries the assistant, the model ingests the planted text and follows it. The injection makes Slack AI render a deceptive Markdown link whose URL encodes private-channel data in the query string, so clicking it exfiltrates the secret to the attacker's server. A subsequent Slack update that added files from channels and DMs to AI answers widened the attack surface.
- HIGHAI-LIVING-OFF-COPILOT-2024
At Black Hat USA 2024, Michael Bargury of Zenity presented Living off Microsoft Copilot, demonstrating how indirect prompt injection, RAG poisoning and phantom references let an attacker manipulate Microsoft 365 Copilot to exfiltrate sensitive enterprise data, bypass Data Loss Prevention controls, and conduct AI-driven spear-phishing and social engineering. Zenity released red-team tooling including LOLCopilot, CopilotHunter and PowerPwn v3. This was a red-team research demonstration against the live product rather than a single patched CVE.
- HIGHAI-SKELETON-KEY-2024
Skeleton Key, disclosed by Microsoft's Mark Russinovich, is a multi-turn jailbreak that convinces a model to augment rather than replace its safety guidelines, agreeing to answer any request but prefixing potentially harmful output with a warning instead of refusing. Once the model accepts this behavior change, it complies with otherwise-restricted requests across categories such as explosives, bioweapons, self-harm and violence. Microsoft tested it against models from Meta, Google, OpenAI, Mistral, Anthropic and Cohere, with most complying fully. It is a jailbreak technique rather than an exploited product vulnerability.
- HIGHCVE-2024-5565
The Vanna.AI text-to-SQL library exposes an ask() method that, with visualization enabled by default, pipes LLM output through a chain of SQL to Python code to a Plotly visualization rendered with exec(). An attacker supplying crafted natural-language input can use prompt injection to override the intended Plotly code and have arbitrary Python executed on the host, yielding remote code execution. The flaw, discovered by JFrog, affects versions up to and including 0.5.5 and is fixed in 0.5.6 or by disabling visualization for external input.