OWASP

OWASP AI Agent Security: all 10 risks, mapped to Kernel

A risk-by-risk walkthrough of how Kernel's decision enforcement layer covers the OWASP Agentic Top 10, and an honest map of where each control actually lives.

Hugo Hernandez · Co-founder & CEO

May 22, 2026 · 12 min read

TL;DR

The OWASP AI Agent Security Cheat Sheet is the clearest public checklist of the failure modes specific to autonomous agents. The hard question it leaves unanswered is *where in your stack you actually enforce against each one*. This post walks through the ten risks one by one and names the Kernel control that addresses each, with three coverage labels, full block, containment, and auditable detection, so you can see exactly what a decision enforcement layer can and cannot deliver.

In one sentence: A decision enforcement layer like Kernel sits between the agent and the world it acts on, so most of the OWASP Agentic Top 10 risks become a verdict (ALLOW / DENY / REQUIRE_HUMAN) instead of an incident.

Quick reference: the ten risks, mapped

Prompt Injection, Intent Anchoring + identity binding. *Containment.*
Tool Abuse & Privilege Escalation, Declared scope + stateful session policy. *Full block.*
Data Exfiltration, envelope encryption (AES-GCM + RSA-OAEP) with the agent as blind courier. *Full block (known classes) + detection (novel).*
Memory Poisoning, Session TTL + immutable audit log + circuit breaker. *Containment + forensics.*
Goal Hijacking, Intent Anchoring + audit + session TTL. *Detection + intervention.*
Cascading Failures, A2A chain governance + permission intersection. *Full block (in-path).*
Malicious Console Configuration, Immutable audit + signed config + admin guardrails. *Full block (in-path).*
Denial of Wallet, Rate limiting + loop detection + circuit breaker. *Full block.*
Sensitive Data Exposure, envelope-encrypted payloads + PII/secret detection + tamper-evident audit logs. *Full block (declared) + detection (inferred).*
Supply Chain Attacks, Endpoint allowlists + drift detection. *Scope reduction + detection.*
Bonus: Excessive Autonomy, REQUIRE_HUMAN verdict. *Full block.*

Why this post exists

The OWASP AI Agent Security Cheat Sheet did something the industry needed. It took the sprawling, badly-named risk surface of autonomous AI agents and pinned ten clear failure modes to the wall. Anyone shipping agents, especially into financial services, healthcare, or any regulated environment, now has a shared vocabulary for the conversation that used to start with hand-waving and end with "we'll figure it out."

The list is good. The honest follow-up question is harder: where in your stack do you actually enforce against it?

Most teams discover the answer the slow way. They wire prompt-filtering libraries at one end, role-based access at another, a manual approval workflow in Slack for the actions that scare them most, and a logging pipeline they hope someone will read. Each layer addresses one or two risks. The seams between them are where incidents live.

Kernel was built around a different assumption: an agent's risk surface is not eleven separate problems, it is one problem expressed eleven ways. The problem is that an LLM-driven process can now reach for consequential actions and consequential data, and nothing in the traditional stack was designed to ask, in real time, *"should this specific action, by this specific agent, with this specific intent, be allowed to execute?"*

Kernel asks that question. Before any consequential action executes, the agent's backend calls Kernel.check(), and the answer comes back as one of three verdicts: ALLOW, DENY, or REQUIRE_HUMAN. That single decision point, sitting between the agent and the world it acts on, is where Kernel covers all ten OWASP AI agent security risks.

How Kernel is structured

Three pillars. Guardrails (universal protections that ship turned on), Intent Anchoring (a session-scoped record of *why* the agent was deployed, evaluated against every action it then takes), and envelope-encrypted data handling with the agent as a blind courier: sensitive fields are sealed with per-record AES-GCM data keys wrapped by RSA-OAEP under a key your KMS custodies, so the agent transports ciphertext it cannot read and only the destination service unwraps it. Every one of those operations, plus every verdict, is written to a tamper-evident, Ed25519-signed audit log, which is what auditors actually ask for.

Three tiers of policy. Tier 1 is the universal baseline that applies to every agent (least-privilege defaults, rate limiting, identity binding, audit logging). Tier 2 is the control group (circuit breakers and budget controls that catch runaway behaviour at the session level). Tier 3 is vertical policy packs (parameterised rule sets for specific agent types: payment agents, KYC agents, hiring agents).

Three verdicts. ALLOW lets the action through. DENY blocks it and records the reason. REQUIRE_HUMAN pauses the agent, routes the decision to a designated approver via Slack, email, or webhook, and resumes the agent with the human's decision attached. That third verdict is the one most stacks don't have, and it is what makes the difference between an agent that can only read and an agent that can safely write.

1. Prompt Injection (direct and indirect)

Risk. Malicious instructions injected into user input or external data sources, webpages, documents, emails the agent reads, that hijack the agent's behaviour. The agent cannot reliably distinguish "data to process" from "instructions to follow."

Kernel control: Intent Anchoring + identity binding. When an agent session starts, Kernel records the agent's declared intent, the task it was deployed to perform. Every subsequent check() call evaluates the requested action against that anchor. A customer-support agent that suddenly tries to issue a wire transfer fails the intent check regardless of how persuasive the injected instructions were, because the action is incoherent with the recorded task.

Coverage type: containment, not prevention. Kernel does not attempt to detect prompt injection at the LLM input layer, that is a model-layer concern, and pattern-matching defences at that layer are gameable. What Kernel guarantees is that even if the injection succeeds inside the model, the consequential action it triggers cannot execute without passing the intent check. The blast radius shrinks from "anything the agent's credentials can reach" to "actions consistent with the agent's declared purpose." The injection becomes a logged event, not an incident.

2. Tool Abuse and Privilege Escalation

Risk. The agent chains together individually-permitted tool calls to reach an outcome beyond its intended scope. Each step looks fine; the sequence is what causes harm.

Kernel control: per-agent declared scope + stateful session policy. At registration, every agent declares its allowed tools, its data scope, and its role. Kernel enforces those constraints on every action, not as a static IAM check but as a session-aware policy evaluated against the agent's actual sequence of calls. Repeated reads followed by a destructive write outside the declared scope trips the policy. So does a sequence that bypasses a tool the agent was explicitly told to use (for example, skipping the KYC service before account creation).

Coverage type: full block at the action plane. This is one of Kernel's strongest surfaces, because the scope and tool declaration is structured data the policy engine can reason about deterministically. Where the sequence is genuinely ambiguous, the policy can route to REQUIRE_HUMAN rather than guessing.

3. Data Exfiltration

Risk. Sensitive data in the agent's context window leaks outward, through tool call parameters, through crafted outputs encoding data into seemingly innocent fields, or simply by the agent being instructed to send it somewhere.

Kernel control: envelope encryption with the agent as blind courier. Sensitive fields (PII, credentials, PANs, IBANs) are sealed at the boundary using envelope encryption: a per-record AES-256-GCM data encryption key encrypts the value, and that data key is wrapped with RSA-OAEP under a master key your KMS custodies. The agent receives only the ciphertext and a stable reference handle. It carries that ciphertext from source to destination as a blind courier, it cannot decrypt it, cannot reason over the plaintext, and cannot leak what it never held. Only the policy-permitted destination service unwraps the data key and reads the value. Ciphertext that escapes into outputs, logs, or downstream prompts is inert by construction. Every seal, transport, and unwrap is written to a tamper-evident, Ed25519-signed audit log.

Coverage type: full block for known classes; auditable detection for novel formats. Envelope encryption operates at protection levels selected per regulatory regime (GDPR, HIPAA, PCI). Coverage for declared sensitive fields is deterministic, the agent literally never sees the plaintext. For novel or semantically-encoded sensitive content that wasn't declared, the boundary scan acts as a detection signal feeding the audit log and triggering a REQUIRE_HUMAN verdict on high-egress actions. Anyone who tells you the semantic-encoding problem is fully solved is selling you something. Kernel narrows the surface aggressively and surfaces the rest to a human.

4. Memory Poisoning

Risk. Persistent agent memory, vector stores, session logs, long-term context, is fed malicious data that gets retrieved in later sessions, potentially serving different users. The poisoned memory acts as a persistent backdoor.

Kernel control: session TTL + immutable audit log + circuit breaker. Kernel enforces session TTLs at the identity level: an agent's session-scoped permissions and intent anchor expire on a defined cadence, preventing a poisoned memory from carrying authority forward indefinitely. The immutable audit log captures every action and every retrieval that influenced an action, so a downstream investigation can trace which memory write produced which later behaviour. The circuit breaker halts agent sessions exhibiting drift consistent with poisoned-memory influence.

Coverage type: containment + forensics. Memory poisoning cannot be fully prevented from outside the memory layer itself. What Kernel guarantees today is that a poisoned memory cannot extend its blast radius indefinitely, and the path from poison to incident is fully reconstructable from the audit log. For regulated customers, that audit traceability is often more valuable than detection, because it is what auditors actually demand.

5. Goal Hijacking

Risk. Through crafted inputs, the agent's objective is gradually or abruptly redirected. The agent continues behaving plausibly, executing real tasks, producing real outputs, but in service of attacker goals. Particularly dangerous in long-horizon agentic workflows.

Kernel control: Intent Anchoring + audit log + session TTL. Goal hijacking is the canonical Intent Anchoring use case. The anchor is recorded once, at session start, with the deployed purpose. Drift across the session is evaluated continuously: each action's coherence with the anchor is scored, and when coherence drops below a configurable threshold the agent is paused for human review. Session TTLs cap how long any single anchor can remain authoritative.

Coverage type: detection + intervention, not prevention. The combination of intent scoring, session TTL, and the REQUIRE_HUMAN escalation path means that hijacked goals surface as a paused agent and a queued approval, not as silent compromise.

6. Cascading Failures

Risk. In multi-agent architectures, a compromised agent passes malicious instructions or tainted data downstream to other agents that trust it implicitly. Trust is transitive without explicit verification at each hop.

Kernel control: A2A chain governance + permission intersection + circuit breaker. When Agent A calls Agent B, Kernel records the calling context, who delegated, with what authority, under what intent. Agent B's effective permissions are computed as the intersection of its own declared scope and the delegated authority. If Agent A's permissions have been narrowed or its intent has drifted, the downstream chain inherits that narrower scope automatically. The circuit breaker halts the chain at the first node showing anomalous behaviour rather than letting it propagate.

Coverage type: full block at every hop, but only for chains routed through Kernel. This is genuinely solved at the architecture level for any agent-to-agent call that flows through Kernel.check(). Out-of-band agent communication that bypasses Kernel is by definition outside the trust chain.

7. AI Console Malicious Configuration

Risk. Developer consoles used to configure LLM behaviour, system prompts, tool definitions, model parameters, can be compelled to consume data containing embedded instructions that alter the underlying configuration. Prompt injection, but at the configuration layer.

Kernel control: immutable audit log + admin guardrails + signed configuration. Configuration changes flowing through Kernel are recorded immutably with the identity of the operator and the diff applied. Tier 1 admin guardrails restrict who can modify which configuration surfaces, and high-impact configuration changes, adding tools, expanding scope, modifying intent anchors, route through REQUIRE_HUMAN by default. The audit log makes every configuration drift visible.

Coverage type: full block for changes routed through Kernel; detection-only for console drift outside Kernel's control surface. If the customer's LLM provider console is mutated directly, outside Kernel's configuration plane, Kernel cannot prevent the change, but the resulting behavioural drift surfaces immediately through Intent Anchoring and the circuit breaker.

8. Denial of Wallet

Risk. Adversarial inputs trigger unbounded agent loops, infinite tool call chains, recursive subagent spawning, repeated LLM completions, running up compute and API costs at speed.

Kernel control: rate limiting + loop detection + circuit breaker. Tier 1 rate limits cap calls per agent, per session, and per tool. The circuit breaker detects repeated identical tool calls and pathological subagent spawning patterns and halts the session. Hard caps on tokens and calls per session prevent any single agent from sustaining a runaway burn.

Coverage type: full block. This is the cleanest OWASP category for a decision enforcement layer. Cost-driven attacks have a measurable signature, the proxy sits exactly where the signature is visible, and the response, halt the loop, is unambiguous. If Kernel is in the path, Denial of Wallet is solved.

9. Sensitive Data Exposure

Risk. PII, credentials, or confidential data inadvertently included in agent context, logs, or downstream API calls. Often the agent doesn't know the data was sensitive; nothing told it.

Kernel control: envelope-encrypted payloads + PII and secret detection + tamper-evident audit log. Declared sensitive fields are sealed with per-record AES-GCM keys wrapped via RSA-OAEP before the agent sees them, so the agent carries ciphertext, not values. Outbound payloads are scanned for known credential and PII patterns; matches are redacted, blocked, or escalated to REQUIRE_HUMAN depending on the policy. Kernel's own audit log never captures plaintext for sealed fields, it records only the ciphertext handle, the policy decision, and an Ed25519 signature over the entry. The log itself is tamper-evident: every entry is chained and signed, so any after-the-fact modification is detectable.

Coverage type: full block for declared sensitive classes; detection + redaction for inferred classes. Where the regulatory regime requires it (PCI, HIPAA, GDPR's Article 32), the appropriate envelope-encryption protection level is selected at registration. Coverage is verifiable and demonstrable to auditors, and the tamper-evident audit trail is what most enterprise procurement and compliance reviews actually fail competitors on. This is Kernel's largest single compliance moat.

10. Supply Chain Attacks

Risk. Third-party tools, APIs, or data sources integrated into the agent stack are compromised upstream. The agent calls what it believes is a trusted tool; the tool has been tampered with at the source.

Kernel control: endpoint allowlists + tool declaration + anomalous-destination detection. At agent registration, every tool and every external endpoint the agent is permitted to call is declared. Calls to undeclared destinations are denied. New or shifted destination signatures, a tool that suddenly responds from a different domain, returns a payload structure that has changed materially, trigger a circuit breaker and a REQUIRE_HUMAN review.

Coverage type: scope reduction + drift detection. Kernel cannot verify the integrity of what a permitted endpoint returns; that requires vendor-side assurance and is outside any decision enforcement layer's reach. What Kernel does is prevent agents from drifting into undeclared third-party surfaces in the first place, and surface meaningful changes to declared surfaces for human review before they cascade into incidents.

Bonus: Excessive Autonomy

The OWASP cheat sheet flags an eleventh risk that often gets buried in the others, agents taking high-impact, irreversible actions without a moment of human oversight.

The REQUIRE_HUMAN verdict is the entire answer. Every action a customer classifies as high-impact routes through a human approval gate before execution, with a full audit narrative attached to the human's decision. This is the control most teams know they need and have not yet built. It is built into Kernel as a first-class verdict.

The honest map: where coverage really lives

If you map the ten OWASP AI agent security risks to coverage type, the picture looks like this. Full block at the action plane for Tool Abuse, Denial of Wallet, and Sensitive Data Exposure (for declared classes). Full block for activity flowing through Kernel; auditable detection outside its surface for Cascading Failures, AI Console Configuration, and Supply Chain. Containment plus forensics for Prompt Injection, Goal Hijacking, and Memory Poisoning, Kernel cannot prevent the cognitive compromise at the model or memory layer, but it can guarantee the compromise cannot execute consequentially and that the path is fully reconstructable from the tamper-evident audit log. Envelope-encrypted, blind-courier transport for known classes; detection signal for novel encodings for Data Exfiltration.

This is the honest map. Anyone offering a single-layer "we solve all of agent security" pitch should be made to draw the same picture. The OWASP categories are not equal in nature; they don't all yield to the same control; and the value of a decision enforcement layer is that it lives where the most categories *can* be enforced, the action plane, without pretending to live in the model or memory layers where it cannot.

Why this matters now: EU AI Act, DORA, and Consumer Duty

The EU AI Act's Annex III obligations for high-risk AI systems (credit scoring, KYC, hiring, healthcare triage) become enforceable on 2 December 2027. The requirements include risk management systems, logging, transparency, human oversight, and post-market monitoring, every one of which maps onto a Kernel surface (tamper-evident audit log, REQUIRE_HUMAN, Intent Anchoring records, envelope-encrypted data handling).

DORA's operational resilience requirements for EU financial entities went live in early 2025 and demand the same auditable controls for any algorithmic process touching financial decisioning. The FCA's Consumer Duty in the UK places equivalent obligations on outcomes from automated systems.

The OWASP Agentic Top 10 is not a compliance framework, it is an engineering checklist. But the overlap with what auditors will be asking for in the next twelve months is near-total. If you are building or shipping AI agents into a regulated environment, mapping your stack against these ten risks is no longer optional. The question is whether you do it once, deliberately, at the action plane, or in ten places, with seams between them.

Walk the map against your own architecture

If you would like to walk through the OWASP AI agent security mapping against your specific agent architecture, we run that as a one-hour session, no commitment, just the picture. Book a 30-minute call and bring the action you are least comfortable letting your agent take in production.