5 min read
  • AI
  • agents
  • infrastructure

The Agent Platform Wars Are Really a Quiet Rewrite of Serverless

Strip the branding off AgentCore, AgentRun, AgentKit and the rest, and the five major clouds are all doing one thing: rewriting serverless from the ground up for agents.

Also on Substack

The Agent Platform Wars Are Really a Quiet Rewrite of Serverless

The naming alone is a mess. AgentCore, AgentRun, AgentKit, Agent Engine, Agents SDK. Every cloud provider is fighting for the same word, and most of the coverage just tallies up who supports how many models and who open-sourced which framework. But strip away the surface, lay the actual product docs from the five major clouds side by side covering roughly the second half of 2025 through the first half of 2026, and they’re all doing the same thing underneath. They’re rewriting serverless, end to end, for agents.

Start with the boundary between runtime and sandbox, which has basically dissolved over the past year. A year ago there were two separate camps. One was busy with frameworks like LangChain, AutoGen, and CrewAI. The other was busy with code sandboxes like E2B, Modal, and Daytona. Neither paid much attention to the other. Then AWS shipped AgentCore and the line all but disappeared. A single session can run for eight hours, idles for fifteen minutes before it gets reclaimed, sits on per-session microVM isolation underneath, and speaks LangGraph, CrewAI, or Strands above. Alibaba Cloud’s AgentRun went further, running on its function-compute layer with shallow sleep that wakes in a millisecond and deep sleep that wakes in seconds. That squeezes every drop out of the one trait that defines an agent: long-running, but mostly spent waiting. Volcano Engine’s AgentKit wires four sandbox types (code, browser, CUA, MUA) into one runtime with cold starts under 150 milliseconds, and right now no one else in China matches that completeness. Cloudflare skipped the microVM path entirely and made each agent a Durable Object with its own SQLite and WebSocket, distributed to the edge. Different mechanism, same answer to the same question: agents need to be stateful, long-lived, and wakeable on demand.

The most counterintuitive part isn’t the isolation tech. It’s the billing model. Across an agent’s lifetime, more than 70% of the clock is spent waiting on an LLM to respond or an external tool to return. Traditional serverless charges by invocation or by container-minute, which means every second an agent spends waiting is burning money, and none of that waiting has anything to do with the user’s experience. Alibaba’s AgentRun says the quiet part out loud with split billing for active versus idle. Active execution is charged normally on vCPU plus memory, and a waiting agent pays only a tiny fee to keep its memory warm. The 60% TCO reduction they advertise is really just them not charging you for something they never should have charged for. AWS AgentCore bills vCPU-hour and GB-hour by the second and doesn’t charge CPU while you wait on the model. LangGraph Platform splits “node execution” and “production standby per minute” into two separate lines. It looks like a billing trick, but underneath it’s the first time the cloud’s pricing philosophy has admitted that a workload which mostly waits on someone else is fundamentally not the same animal as request-response.

Another badly underrated story is that MCP won the tool layer in 17 months. It started as an Anthropic-private protocol, and by December 2025 it had been donated to the Agentic AI Foundation under the Linux Foundation. Public MCP servers went from a little over a thousand at the start of the year to more than 9,400, with native support across AWS, Azure, GCP, Alibaba, Volcano, OpenAI, and Cloudflare. That’s faster than Kubernetes consolidated container orchestration, and the reason is simple. Everyone wants to build an agent platform, and nobody has any incentive to invent yet another incompatible tool protocol. Better to stop fighting over the standard and pour the energy into gateways, auth, and observability, where you can actually differentiate. Volcano’s AgentKit gateway pitches “turn any HTTP interface into MCP with zero changes,” Azure Foundry’s MCP server hangs Entra identity right off it, and the next battle has clearly moved to the MCP server registry layer and agent-to-agent protocols like A2A.

The shift in the memory layer is quieter but more consequential. A year ago memory was something you bolted on inside a LangChain module, maintaining your own vector store and session table by hand. Now AWS has carved AgentCore Memory into its own SKU, GCP’s Memory Bank goes GA in 2026, Azure Foundry Memory is in preview, Volcano’s AgentKit has its own memory module, and Letta turned “long-term memory plus git-backed context plus sleep-time compute” into the product itself. The underlying judgment is unanimous. For an agent to become real productivity, it needs memory that persists across sessions and devices and keeps accumulating, otherwise every conversation is disposable and no enterprise can embed it into a real workflow. Memory is going from an optional framework module to a first-class citizen of the runtime. Memory-first startups like Mem0 and Letta have some runway left, but getting absorbed into the clouds as a built-in is only a matter of time.

The contrast between the global and Chinese landscapes is unusually clear this round. Abroad it’s a division of labor. The three big clouds fight over infra, LangChain and CrewAI and Letta sell frameworks, OpenAI and Anthropic sell models, and enterprises assemble the best combination per use case. In China, Alibaba, Volcano, Tencent, Baidu, and Huawei are colliding head-on, each chasing an end-to-end closed loop. Alibaba runs AgentRun plus Bailian plus an Agent Store plus Alipay for monetization. Volcano runs AgentKit plus Doubao plus HiAgent plus the open and commercial versions of Coze. Those two have the strongest infra in China right now. Tencent Yuanqi leans toward consumers and small merchants, Baidu’s Qianfan AppBuilder is solid in narrow niches like document parsing but less transparent on infra, and Huawei leans on industry templates and Ascend compute. OpenAI, meanwhile, simply admitted the Assistants API failed and is sunsetting it for good on August 26, 2026, handing off to the Responses API and a new Agents SDK, while Microsoft’s Agent Framework merges Semantic Kernel and AutoGen. Every framework vendor this year is making the same move: collapse the old framework into a new graph and a new SDK.

Pull all the threads together and the real moat in this round of agent infra isn’t a slightly better framework or a slightly stronger model. It’s whether you can fuse runtime isolation, billing model, MCP gateway, and long-term memory into a single thing an enterprise is willing to bet on for years. Frameworks turn over every couple of years and models iterate every six months, but once a runtime and a billing model are wired into production, the cost of migrating off is brutal. That’s exactly the moat logic cloud providers know best. They’re just not selling CPU and bandwidth this time. They’re selling a new capability: how to let an agent wait, cheaply. So the next thing worth watching isn’t which provider ships another agent. It’s who, once memory and runtime and tools have all become infrastructure, first builds a genuinely profitable SaaS on top of that layer, instead of stopping at one more demo or one more workflow editor.

02 · More writing