- AI
- agents
- startups
Your Agent Infrastructure Is Probably Not Your Moat
My team spent a month building our own agent stack, then realized twenty other companies were building the exact same thing. On when to build agent infra and when to hand it off.
Also on Substack
My team spent a month building our own agent service. E2B for the sandbox, Redis for state, our own agent loop, then tool registration, tracing, and guardrails stacked on top layer by layer. It worked. But when I stepped back and looked at it, every single layer we’d built was something at least twenty other companies were building in exactly the same way.
Which is worth sitting with for a minute. Should a startup build its own agent infrastructure at all?
First, the case for building, because the teams that go this route aren’t being stupid. An agent’s behavior depends heavily on how you orchestrate the loop, how you manage context, and how you design your tool-calling strategy. All of that is tightly coupled to your business logic, and you don’t get it working well just by dropping in a generic framework. Building it yourself means total control over every layer. Swap the model whenever you want, tune the loop logic whenever you want, and when something breaks you can see all the way down to what actually happened. For a technically driven team that control isn’t imaginary. It genuinely lets you iterate fast and fail fast in the early days.
But the cost of maintaining that control grows exponentially over time, and nowhere more than around model upgrades. Anthropic’s own engineering blog has a great example. When they were building their agent harness, they found Sonnet 4.5 had a kind of “context anxiety,” where the model would wrap up a task early as it approached the context limit. So the team added a context reset to the harness to handle it. Then they moved to Opus 4.5 and the behavior was simply gone, which turned the earlier optimization into dead weight. If Anthropic’s own team has to deal with behavioral drift between their own models, does a startup really understand model upgrades better than the people who make the models? Our own experience matches this exactly. Every model upgrade, the thinking format changes, the tool-use behavior changes, and the agent loop has to change with it, and each round costs a week or two. You think you’re building a product. You’re actually chasing the model’s tail.
And the chase isn’t only about model behavior. When we built our own sandbox, just handling container cold-start latency, filesystem persistence, and network-isolation policies ate nearly two weeks of one engineer’s time. None of these is complicated on its own, but stacked together they form a black hole that constantly drains attention, leaving your core team with no bandwidth for the question that actually matters: what do users actually need the agent to do, and how good is good enough.
The PaaS and IaaS layers are also maturing faster than a lot of people expected. In early April, Anthropic opened the public beta of Managed Agents. It’s not an API wrapper. It hosts the entire agent runtime, with sandbox, filesystem, browsing, checkpointing, and credential management built in, deeply tuned for their own models, and their numbers put complex-task success rates nearly 10 points above a standard prompting loop. AWS AgentCore takes the other path. At the IaaS layer it gives you a serverless runtime, a tool gateway, persistent memory, and a code interpreter without locking you to a model, more like a kit of parts you assemble yourself. One says “I’ll manage all of it for you,” the other says “here are the components, snap them together.” But both are doing the same thing underneath: freeing application-layer teams from building infra from scratch.
The arrival of those two turns “build or host” from “build is the only option” into “it depends.”
There are of course cases where building is still the only choice. If your product is the agent infra layer itself, say multi-model routing, agent observability, or security auditing, then building is your core business and there’s nothing to debate. If you have hard requirements on data residency, latency, or compliance, hosted services genuinely may not meet them yet. And if your orchestration logic is coupled to the business so deeply that no general platform can hold it, that’s a legitimate reason to build. But these are the minority among startups. For most teams, the real reason they build isn’t “we have to,” it’s “it feels like we should.”
For most application-layer startups, agent infrastructure is probably not your moat. Users don’t care whether your sandbox is E2B or Firecracker. They care whether the agent gets the job done. Your differentiation lives in your understanding of the business, your design of the use case, and your data flywheel, not in your infra. Pouring your scarcest engineering resources into the most commoditized layer is a trade that doesn’t add up.
What I do now is simple. If PaaS covers it, use PaaS. If it doesn’t, assemble it from IaaS. Only the genuinely unique sliver gets built in-house. Our agent, for example, has to maintain a complicated business state machine across many turns of conversation, and the transition logic of that state machine is tightly bound to our domain knowledge, so no general platform can do it for us. That part we write ourselves. But sandbox, tool execution, tracing, all of it is handed off. It may not be the optimal setup, but at least it keeps the team spending its energy on the only things only we can do.
02 · More writing