Industry

Banco Santander just open-source agent governance

Santander AI Lab open-sourced its agent governance research under Apache-2.0. When a G-SIB publishes the primitives, the category is no longer a thesis. Here's the signal, and where regulated teams need to keep going.

Hugo Hernandez · Co-founder & CEO

June 25, 2026 · 4 min read

On June 17, 2026, Santander AI Lab ↗ ↗ pushed thirteen repositories to GitHub under Apache-2.0 and CC BY 4.0. The org describes itself plainly: "Open source AI projects from Banco Santander. Responsible AI, MLOps, graph ML and LLM evaluation for financial services." Most of the release is the research tooling you would expect from a bank's AI lab. One repo is something else.

mech-gov-framework ↗ ↗ is, in Santander's own words, "Mechanical Governance for LLM Decisions": model-agnostic governance regimes, hard gates, and governance metrics for high-stakes decisions. Strip away the research framing and what is left is a sketch of the exact problem Kernel productises. A bank built it, ran it against synthetic credit, KYC, and AML cases, and put it on the internet for anyone to read.

This post is for the engineering and compliance leads at Series A and Series B fintechs who are watching the same regulatory clock Santander is, and trying to work out what this release means for them.

The second validator

In April 2026, Microsoft published the Agent Governance Toolkit ↗ ↗ under the MIT License. We wrote at the time ↗ that when Microsoft endorses a problem space with permissive open-source code, the question stops being "is this a real category?" and starts being "which version do I deploy?" That was the platform vendor validating the category from the infrastructure side.

Santander is a different kind of validator, and arguably a more telling one. Microsoft governs its own developer agents. Santander is a globally systemically important bank, supervised under the EU AI Act it now has to comply with, under DORA, and under the full weight of prudential regulation. When the regulated incumbent, the buyer profile itself, builds runtime governance for AI agents in-house and decides the underlying primitives are worth open-sourcing, that is the market telling you where it is heading. Two of the most credible institutions in software and in banking have now independently concluded that consequential agent actions need a deterministic gate in front of them. The thesis is settled.

What Santander actually published

The relevant repos cluster around responsible decisioning in financial services, and they are worth understanding on their own terms.

mech-gov-framework runs a per-case pipeline of hard gates that fire before any model call: a sanctions hit above a risk threshold forces a decline, an insider flag forces an escalation to a human, a case with too little information is deferred rather than decided. Every gate carries a written rationale. The framework also ships governance metrics, including measures of whether an escalation was substantive or merely cosmetic. autoguardrails ↗ ↗ holds an adversarial evaluation suite fixed and accepts a new guardrail policy only if it lowers attack success without breaking benign traffic. gen-fraud-graph ↗ ↗ generates synthetic AML transaction graphs, and sota-stressed-datasets ↗ ↗ republishes known datasets with deliberate noise, missingness, and contradictions to test how systems behave when the input is ambiguous.

Read together, these are the components of a regulator-facing governance stack: deterministic gates, human escalation, adversarial policy validation, and stress data that mirrors the messy reality of a real credit or KYC file. This is an implementation of EU AI Act Annex III high-risk obligations for creditworthiness and KYC, enforced through the Article 14 requirement for human oversight, expressed as code.

The detail that matters most

Santander's framework does not output a binary. Its decisions resolve into approve, decline, and, critically, two distinct flavours of "send this to a human": escalate and defer. A bank's own research team, building governance for its own high-risk decisions, made human routing a first-class output of the system rather than a fallback for when the rules run out.

That is the single design principle Kernel is built around. The third verdict, route to a human, is the part of agent governance that the binary tools skip and the part regulated teams cannot ship without. It is striking, and useful, that when a G-SIB sat down to model governance from scratch, it arrived at the same conclusion. The framework even includes metrics for whether that human review is real or rubber-stamped, which tells you how mature the thinking in the category has become. The question is no longer whether to put a human in the loop. It is whether your system can prove the human did something.

What a bank can build that a startup cannot

Here is the asymmetry the Santander release exposes. A G-SIB has an AI research lab, a standing compliance function, and the time to build a research-grade governance framework in-house and refine it against its own case data. The Series A fintech shipping a credit or payments agent into the same EU AI Act Annex III surface, facing the same DORA obligations, has none of that. It has a deadline and a small engineering team that would rather be building the product.

Santander published the primitives. It did not publish a production system you can deploy on Monday. mech-gov-framework is single-case research code that puts an LLM in the decision loop and grades its own reasoning. It is a model for thinking, not a runtime you drop in front of a live agent that is about to move real money. Turning those primitives into something production-ready, on a hot path that does not add latency, with credentials that never leave your infrastructure and an audit log a regulator will accept, is the work. For a bank, that work is a line item. For a startup, it is the difference between shipping this quarter and missing the Annex III deadline entirely.

Where Kernel fits

Kernel is that production system, built for the teams that have Santander's regulatory exposure without Santander's research budget. One in-process SDK call wraps every consequential agent action and returns one of three verdicts: allow, block, or route to a human. The guardrails covering the OWASP Agentic Top 10 are immutable and run before any policy check. Intent is anchored at agent creation, so mid-session manipulation is caught before execution. Every decision is signed, hash-chained, and exportable in EU AI Act, DORA, and AIUC-1 formats. Credentials never leave your infrastructure, data never leaves your process, and most actions clear in under ten milliseconds.

The same instincts a systemically important bank encoded into its research framework, deterministic gates, first-class human escalation, evidence built in rather than bolted on, are the instincts Kernel ships as a product. Santander showed you what good governance looks like when an institution with unlimited resources designs it. We built the version you can put in front of a production agent in weeks, not quarters.

If you are deploying an agent into a regulated decision and watching the August 2, 2026 Annex III deadline approach, book a 30-minute call ↗. Bring the action you are least comfortable letting your agent take in production. We will show you exactly how Kernel decides on it.