When pods crash at odd hours, you need an AI that investigates like an SRE, not a chatbot. You need something that checks the right things, in the right order, tells you what it found, and waits for your call before touching anything. Current tools either dump a wall of logs on you and say "good luck," or run opaque automations you can't see, can't trust, and can't explain in a postmortem.
We spent the last several months designing and building Nova's investigation engine. This post is about the approach we took, the mental models that shaped it, and the trade-offs we made along the way.
Every AI agent demo looks the same. The model calls a tool, gets a result, responds. Ship it. Then you try to run it against real infrastructure — and the demo falls apart in ways nobody warned you about.
We've spent over a year building Nova, an AI agent that operates real infrastructure for real teams. Not a chatbot that wraps API calls, but a system that investigates incidents, executes remediations, and composes across dozens of integrations. This post is about what we learned — the problems that made us rebuild entire subsystems, and the patterns that survived.
Platform teams carry operational knowledge that doesn't transfer easily. The debugging instincts, the service interdependencies, the deployment quirks — they accumulate over years and live in a small number of people's heads. When those people are unavailable, the gap shows.
We built Nova to encode that operational knowledge into a queryable system. This post covers what the architecture looks like and what we learned building AI that actually operates infrastructure.
Platform engineering represents the natural evolution of DevOps and SRE principles, but it faces a fundamental challenge: how do you scale platform expertise across an entire organization without requiring every developer to become a cloud expert?
This is where Nova comes in — your AI platform engineer that makes infrastructure accessible to everyone through natural conversation.
The Platform Engineering Evolution
DevOps broke down silos → SRE brought engineering rigor to operations → Platform Engineering created self-service infrastructure → Nova makes platform engineering conversational and accessible to everyone.
The Platform Engineering Challenge: Scale vs. Expertise
Platform engineering promised to solve the "you build it, you run it" scaling problem by creating Internal Developer Platforms (IDPs). But even the best platforms face fundamental limitations:
👥
Expert Bottlenecks
Platform teams become the new constraint—everyone depends on their expertise
📚
Documentation Decay
Complex systems require constant documentation that quickly becomes outdated
🧠
Context Loss
Critical operational knowledge lives in tribal knowledge, not systems
⚙️
Cognitive Load
Developers still need to understand infrastructure concepts to use platforms effectively
The Core Issue
We've built self-service platforms, but we haven't solved the underlying problem of democratizing platform engineering expertise.
Nova is an AI platform engineer that helps you manage infrastructure through natural conversation. Ask questions, get answers, generate configurations, and troubleshoot issues — all through simple chat.
What makes Nova different:
Works with your existing tools (Slack, GitHub, AWS, Terraform, Kubernetes, and more)
Available however you want to work — browser, self-hosted, or in your editor
Nova's power comes from its extensibility. Connect the tools you already use:
Available Skills:
Cloud Providers — AWS, Google Cloud, Azure cost calculations and resource management
Communication — Slack integration for team collaboration
Development — GitHub for code search, issues, and PRs
Infrastructure — Terraform and Helm configuration generation
Kubernetes — Cluster management and troubleshooting
And more — Add any MCP server for custom integrations
Bring Your Own Tools
Nova Direct includes the built-in MCP Marketplace for custom integrations. Nova Connect works through standard MCP-compatible clients, so teams can bring Nova into existing editor and CLI workflows without running the server locally.
The evolution from DevOps → SRE → Platform Engineering → AI-Assisted Platform Engineering represents more than technological progress — it's about democratizing expertise that has historically been scarce and expensive.
Traditional Model
Small teams of platform experts serve entire organizations