AI-SKELETON-KEY-2024

LLM security · Skeleton Key jailbreak

Summary

Skeleton Key, disclosed by Microsoft's Mark Russinovich, is a multi-turn jailbreak that convinces a model to augment rather than replace its safety guidelines, agreeing to answer any request but prefixing potentially harmful output with a warning instead of refusing. Once the model accepts this behavior change, it complies with otherwise-restricted requests across categories such as explosives, bioweapons, self-harm and violence. Microsoft tested it against models from Meta, Google, OpenAI, Mistral, Anthropic and Cohere, with most complying fully. It is a jailbreak technique rather than an exploited product vulnerability.

How to avoid it in your code

Apply input and output guardrails plus content filtering to catch jailbreak prompts and unsafe responses.
Harden the system prompt and enforce safety policies that resist multi-turn behavior-change attempts.
Apply least privilege so a jailbroken model cannot reach sensitive tools or data.
Monitor and rate-limit multi-turn conversations for safety-bypass patterns.
Keep models updated to versions with improved jailbreak resistance.

References

Related vulnerabilities

All AI/LLM →