AI-SYDNEY-2023

LLM security · Indirect prompt injection (Bing Chat / Sydney)

Résumé

Greshake et al. demonstrated that adversaries can remotely compromise LLM-integrated applications by planting malicious prompts in data the model later retrieves, such as a web page, rather than typing them directly. Because the model cannot separate trusted instructions from retrieved data, the injected text is executed as new instructions. They showed practical indirect prompt-injection attacks against Bing's GPT-4-powered Chat and code-completion engines, enabling data theft, manipulation of application behavior and control over API invocations. The work established indirect prompt injection as a real-world attack class.

Comment l’éviter dans votre code

Treat all retrieved web/document content as untrusted data, never as instructions.
Isolate retrieved data from the instruction context with clear trust boundaries.
Restrict and gate API/tool invocations the model can trigger; require approval for privileged ones.
Apply input guardrails and output sanitization before acting on or rendering model results.
Scope egress to an allow-list so injected instructions cannot exfiltrate or call external APIs.

Références

Vulnérabilités liées

Tout AI/LLM →