The AI SRE Race Is Running the Wrong Way

Diagnosis is a commodity. Trust is the product.
The AI SRE race will not be won by the agent that diagnoses fastest. It will be won by the system that operators trust enough to grant write access.
A personal note on where AI for operations is actually heading.
Over the past year a new category filled up fast. Depending on how you count, there are now more than a dozen credible tools that call themselves AI SREs. I have watched the space closely, partly because we are building in it, and partly because the speed of convergence is genuinely interesting.
Here is what nearly all of them do. They connect to your telemetry, your code, and your incident tooling. They correlate logs, metrics, and traces. When an alert fires, they form hypotheses, test them against the evidence, and post a likely root cause into Slack, often in under a minute. This is real progress. A few years ago none of it worked. Today most of it does.
Diagnosis is real progress. But it is just phase one.
I want to start by admitting something that a lot of vendors in my position would rather not say. If your goal is read-only triage, you do not strictly need any of these products. The building blocks are now standard. With standard protocols like MCP and A2A, plus a capable model, teams can now wire assistants into internal systems, tools, and other agents quickly and get a reasonable root-cause guess back. The diagnosis layer is becoming a commodity. I do not think that is a controversial claim anymore, and pretending otherwise is how you lose the trust of the exact engineers you are trying to reach.
So if diagnosis is the easy half, what is the hard half?
The hard half is earning permission to act in production.
Google's own SRE team is now building agents that act in production, and they put it more plainly than any vendor has: the hard problem is safe action, not diagnosis. Reading production and changing production are different problems, and the gap between them has little to do with how smart the model is. A model that can tell you what broke still cannot answer the questions an operator has to answer before anyone lets it touch a live system. What exactly is this allowed to change, and what is off limits? How large can the blast radius get before it must stop and ask? Who approved this, under what scope, and when does that approval expire? If it goes wrong, what is the rollback, and is every step on the record? Can one customer's action ever reach another customer's environment?
Most of the field answers these with a single control: a human clicks approve. That is a reasonable place to start, and I respect the teams that are honest about staying there. But one approval button is not a control plane. The independent numbers are sobering. SREGym, a benchmark of 90 realistic SRE problems, shows that frontier agents still vary widely across production failure scenarios, with large gaps in end-to-end results. Gartner also predicts that more than 40 percent of agentic AI projects will be canceled by the end of 2027, citing escalating costs, unclear business value, and inadequate risk controls. The capability is arriving. The trust to use it is not, and that trust gap is not a model problem.
One approval button is not enough. Production action needs scope, blast-radius limits, rollback paths, policy, and audit.
This is where my personal view comes in, and it is the reason we started the company the way we did.
The model was never the bottleneck. The missing layer was control.
The AI expresses intent. The runtime decides what is safe to execute.
You do not make AI safe for production by making the model bigger. You make it safe with a governed runtime around it. The AI expresses intent. The runtime decides what executes. Operational knowledge accumulates in the platform, where it can be audited, rather than inside a model that forgets the moment the conversation ends. In practice that means scoped actions, blast-radius checks before a change runs, policy enforced at machine speed, approvals at the right moments, and an audit trail that explains what happened after the fact.
I am not the only one who thinks the defensible work lives here. As one investor put it recently, the moat is not the intelligence of the agent, it is the safety envelope around it, and that layer is dramatically underbuilt across the ecosystem. Anyone can assemble the diagnosis: a model, a few connectors, a prompt. Far fewer teams have built the layer that lets AI act safely, and that is the part that is genuinely hard to copy. It is also the part that compounds. A diagnosis is a fresh guess on every incident. The runtime learns what is safe to automate, what your real blast radius is, and which actions have earned more autonomy. Part of why it stays underbuilt is that it is the unglamorous half: it is far easier to demo an agent confidently naming a root cause than an agent declining to act because the blast radius was too large.
The fair question is why this should be its own layer rather than a feature the cloud or the observability vendor bolts on. Because production is never one vendor. The control layer has to sit across whatever you already run, enforce at the runtime itself rather than suggest from outside, and stay neutral to the stack beneath it. That is a different kind of product than an assistant added to a dashboard, and it is why we built it as the foundation, not a feature.
There is a quieter truth underneath all of this. When we trust a human in production, part of what we are trusting is their slowness. Reading a runbook step, pausing to think, double-checking before running the command: that friction is an accidental safety mechanism, and it is exactly the thing AI removes. An agent can execute a thousand operations before anyone notices the first one was wrong. The job of the control layer is to replace accidental friction with intentional control, so that autonomy is earned as patterns prove safe, rather than assumed on day one.
So here is where I think the race actually goes. Read today, write next. Not because the models get smarter, though they will, but because the control layer matures to the point where you can safely turn the dial up. The company that wins this category will not be the one that is best at telling you what broke. It will be the one you trust to do something about it. Root cause is the demo. Trusted action is the business.
AstroPulse exists for the moment AI moves from read-only diagnosis to trusted production action. Nova is the AI platform engineer. Astro Platform is the runtime that keeps it safe.