All vulnerabilities
HIGHAI/LLM

AI-SKELETON-KEY-2024

LLM security · Skeleton Key jailbreak

Summary

Skeleton Key, disclosed by Microsoft's Mark Russinovich, is a multi-turn jailbreak that convinces a model to augment rather than replace its safety guidelines, agreeing to answer any request but prefixing potentially harmful output with a warning instead of refusing. Once the model accepts this behavior change, it complies with otherwise-restricted requests across categories such as explosives, bioweapons, self-harm and violence. Microsoft tested it against models from Meta, Google, OpenAI, Mistral, Anthropic and Cohere, with most complying fully. It is a jailbreak technique rather than an exploited product vulnerability.

How to avoid it in your code

  • Apply input and output guardrails plus content filtering to catch jailbreak prompts and unsafe responses.
  • Harden the system prompt and enforce safety policies that resist multi-turn behavior-change attempts.
  • Apply least privilege so a jailbroken model cannot reach sensitive tools or data.
  • Monitor and rate-limit multi-turn conversations for safety-bypass patterns.
  • Keep models updated to versions with improved jailbreak resistance.

References

Related vulnerabilities

All AI/LLM →