AI-EXCESSIVE-AGENCY-2025
LLM security · Excessive agency / unsafe tool use
Résumé
Excessive agency is the class of vulnerabilities where an LLM agent is granted broad tool or function access (file system, shell, email send, database writes, payments) and acts on manipulated model output without per-action authorization, turning any successful prompt injection into real damaging actions. OWASP LLM06:2025 (published November 17, 2024) decomposes the root causes into excessive functionality (extensions exposing more than needed, e.g. a doc-reader tool that can also delete), excessive permissions (downstream credentials with UPDATE/INSERT/DELETE when only SELECT is required), and excessive autonomy (high-impact operations executed without confirmation). The canonical exploit chain is an indirect prompt injection inside an incoming email that drives the agent to scan the inbox for sensitive data and forward it to the attacker, because the agent has both send-mail capability and standing authority to act. The class maps to OWASP LLM06:2025 Excessive Agency.
Comment l’éviter dans votre code
- Minimize tool scope and replace open-ended functions like 'run shell command' with narrow, purpose-built actions.
- Grant downstream systems least-privilege credentials scoped to the agent's actual need.
- Require human-in-the-loop approval for high-impact actions such as send, delete or payment.
- Execute tools in the requesting user's context, not a privileged shared account.
- Mediate and validate every downstream request against an authorization policy.
Références
Vulnérabilités liées
Tout AI/LLM →- CRITICALAI-GROK-BANKR-WALLET-2026
In early May 2026 an attacker drained roughly $150,000 from an AI-powered crypto trading agent on X (Twitter) through prompt injection, an exploit of Grok and the linked Bankrbot agent documented by AI-security researchers including Giskard and NeuralTrust. The attacker posted a Morse-code-encoded message on X and asked Grok to translate it; Grok decoded the obfuscated payload, which contained hidden financial instructions, and the encoding let the untrusted post slip past content filters. Grok processed this user-supplied X content as a trusted directive with no separation between conversation input and authorized commands, then relayed the decoded instruction to the linked Bankrbot agent, which executed it as a legitimate order. Combined with a previously transferred Bankr Club Membership NFT that granted elevated 'Executive' wallet permissions, Bankrbot sent about 3 billion DRB tokens (roughly $150,000) on the Base network to the attacker's wallet, with no human-in-the-loop or circuit breaker on the high-value transfer. About 80% of the funds were later returned after the community identified the attacker.
- HIGHAI-GEMINI-INVITATION-PROMPTWARE-2025
Presented at Black Hat USA 2025 and DEF CON 33 and published August 6, 2025 by SafeBreach researchers Ben Nassi, Stav Cohen and Or Yair, this indirect prompt injection (dubbed 'promptware') hijacks Google Gemini through poisoned Google Calendar invites, emails and shared documents. An attacker sends the victim a calendar invite whose title contains hidden instructions; the malicious text sits unnoticed because long event lists hide entries behind a 'Show more' control yet still enter Gemini's context. When the victim later asks Gemini a routine request such as summarizing their schedule, the agent ingests the attacker's calendar data as trusted context and executes the embedded directives, abusing Gemini's connected agents and tool permissions. Demonstrated real-world effects included controlling Google Home smart devices to open windows, turn off lights and activate a boiler, plus geolocating the victim, starting a Zoom video stream, deleting calendar events and exfiltrating email content. The researchers privately disclosed to Google in February 2025, and Google deployed layered mitigations including user confirmations, URL sanitization and prompt-injection detection before publication.
- HIGHAI-CLAUDECODE-SOURCEMAP-2026
On March 31, 2026, Anthropic accidentally shipped the full source of its Claude Code CLI inside a published npm package. A missing .npmignore rule for *.map left a roughly 59.8 MB source map in the tarball, embedding about 512,000 lines of unobfuscated TypeScript across some 1,900 files, including internal prompts, tool definitions and architecture. The root cause was a packaging failure compounded by a bundler bug: Bun continued emitting source maps even when generation was disabled, and nothing stripped or excluded them before publish. Because npm releases are immutable and mirrored instantly, the source was cloned, dissected and re-hosted within hours, and a clean-room reimplementation reached tens of thousands of GitHub stars the same day. It is a textbook source-map disclosure: the sourcesContent field of a .map file carries the original code verbatim, so a single map left in a shipped artifact hands an attacker the entire codebase, comments and all. The same class hit Apple's App Store web front-end in November 2025, where production source maps left enabled let a researcher reconstruct and publish the full client source.
- MEDIUMAI-SECRETS-SPRAWL-2025
GitGuardian's State of Secrets Sprawl research found that AI coding assistants are driving a surge in leaked credentials on public GitHub. AI-assisted commits leaked secrets at roughly twice the baseline rate, with Claude Code-assisted commits showing a 3.2% leak rate versus 1.5% for human-only commits, contributing to 28.65 million new hardcoded secrets added to public GitHub in 2025 (a 34% year-over-year increase). The study also found 24,008 unique secrets in MCP configuration files, where setup guides often instruct developers to paste API keys directly into config.
- CRITICALAI-COPILOT-CAMOLEAK-2025
Legit Security disclosed CamoLeak (CVSS 9.6), a critical vulnerability in GitHub Copilot Chat enabling silent exfiltration of private source code and secrets. The attack combined remote prompt injection via hidden pull-request comments with a CSP bypass that abused GitHub's own Camo image proxy: injected instructions made Copilot extract sensitive repo context, encode it character-by-character into a pre-generated dictionary of Camo image URLs, and leak it through image requests to an attacker server. GitHub mitigated it by disabling image rendering in Copilot Chat in August 2025.
- CRITICALAI-FORCEDLEAK-AGENTFORCE-2025
Disclosed on September 25, 2025 by Noma Security, ForcedLeak is a CVSS 9.4 indirect prompt-injection chain in Salesforce Agentforce affecting organizations with Web-to-Lead enabled. An attacker submits a public Web-to-Lead form and plants hidden instructions in the Description field, chosen because its roughly 42,000-character limit allows complex multi-step directives. When an employee later asks the Agentforce AI agent to process or summarize that lead, the agent ingests the attacker-controlled text as part of its context and executes the embedded commands, querying and reading internal CRM data such as lead email addresses and other contact and sales-pipeline information. The agent then exfiltrates the harvested data by embedding it in an image or link request to an expired Salesforce-related domain that remained on the Content Security Policy allow-list and was re-registered by researchers for about $5, bypassing egress controls. Salesforce remediated it on September 8, 2025 by re-securing the expired domain and enforcing Trusted URLs for Agentforce and Einstein AI; no CVE was assigned because the issue did not stem from a software version flaw.