Production

The bug that became a company

An agent we shipped at our last company handed one customer's documents to another. The layer we needed to stop it didn't exist, so we built it.

Hugo Hernandez · Co-founder & CEO

June 8, 2026 · 7 min read

It started with a name collision

At Selfers, the company Jimmy and I built and later sold, we ran an agent in production called Sparq. Its job was ordinary: a customer asked for their documents, Sparq found them and delivered them. It worked thousands of times.

Then two customers had the same name.

Sparq disambiguated the way a confident system does. It picked one, pulled the file, and delivered it. To the wrong person. One customer received another customer's documents: a cross-account PII leak, caused by an agent doing exactly what we had asked it to do.

That last part is the part that kept us up. Nobody wrote a bug. No test would have caught it, because there was nothing to catch: the code was correct. The agent read the request, reasoned about it, chose, and acted. The mistake lived entirely inside the half-second between the instruction and the action. And we had nothing watching that half-second.

Why nothing caught it

We were not careless. We had the stack everyone has. We knew who Sparq was: it had an identity, scoped credentials, the right permissions. We had logs, so after the leak we could reconstruct exactly what it had done and when. Identity told us *who*. Observability told us *what happened*, after it had already happened.

What we did not have was anything standing between the instruction and the action, asking the one question that actually mattered in that moment: *should this run?*

That gap has a shape. Software Analyst Cybersecurity Research later mapped the agent-security stack into three layers ↗: identity and access (who the agent is), observability (what it did, analysed after the fact), and runtime governance (what it is about to do, decided before it acts). We had the first two. The leak happened in the third, and the third was empty. Identity can't introspect intent. Observability is reactive by construction; it can only tell you about the breach once the breach is in the log.

Sparq didn't expose a hole in our code. It exposed a hole in the category of tools available to us.

The shift underneath the bug

For five decades, software was deterministic. Same input, same output: predictable, testable, auditable before it ever shipped. You could reason about what your system would do because it would do the same thing every time.

Agents broke that contract. They're non-deterministic by design: the model decides in the moment, and you cannot guarantee its behaviour by testing beforehand. For a while that was tolerable, because agents mostly *read*. They summarised, retrieved, suggested. The blast radius of a wrong answer was small.

What changed in 2025 and 2026 is that agents crossed from reading to *acting*. They execute payments, approve KYC, move data, write to production. And the moment that happened, the equation flipped: you now have a non-deterministic process in charge of consequential, irreversible actions, with no layer governing those actions at the instant they occur.

Sparq wasn't a freak event. It was the first visible symptom of that shift, showing up in our own production before we had a name for it.

So we built the layer we wished we'd had

The fix we needed didn't exist as a product, so we wrote it ourselves. That tool became Kernel.

The idea is deliberately small. Before an agent takes any consequential action, the backend makes one call, Kernel.check(), and gets back one of three verdicts: ALLOW (within policy, proceed), DENY (policy violation, blocked), or REQUIRE_HUMAN (hold the action, route it to a person, resume only once they approve). Every decision is logged immutably and mapped to the regulations that care about it.

Two of our design choices came straight out of the Sparq incident.

The first is REQUIRE_HUMAN as a first-class verdict. Most governance tooling gives you a binary: allow or deny. But the action that hurt us wasn't obviously wrong; it was *borderline*. A confident agent resolving an ambiguous request is exactly the case where you don't want a hard yes or a hard no. You want the system to stop, ask a human, and capture who decided what. As far as we know, no one else offers this as a native verdict. It's also, not coincidentally, what the EU AI Act's human-oversight obligations (Annex III, enforcement 2 December 2027) require for high-risk decisions like credit, KYC, and hiring.

The second is an SDK, not a proxy. Kernel runs in-process inside the customer's own backend. Credentials, data, and private keys never leave their infrastructure. We made that choice for a simple reason: after living through a data leak, we would never have handed our customers' data to a third-party vendor to route it for us. We weren't going to ask anyone else to either.

We checked it wasn't just our scar tissue

A real incident makes you certain. It doesn't make you right. So before we committed, we went and talked to people living the same problem: AI engineers shipping agents inside other companies, and cybersecurity CTOs who own this risk for a living. The gap we'd fallen into wasn't ours alone. Everyone deploying agents that *act* was standing on the same missing layer; most just hadn't had their Sparq moment yet.

What we're building toward

Sparq is a small story. One agent, two customers, one file sent to the wrong person. It didn't make the news. But it's the cleanest illustration we know of where the entire industry is heading: capable systems, acting on real things, faster than anyone can review, with the decisive moment hidden in a half-second no existing tool was watching.

We started Kernel so that the next team doesn't learn what their agent did from a log entry after the fact. The goal was never to slow agents down or keep them read-only and safe. It's the opposite: to let teams give their agents real authority over money, data, and production, and still be able to sleep. Run agents in production. Not in fear.

That's the company the bug became.

If any of this sounds like a half-second you'd rather not leave unwatched, book a 30-minute call.