Renting Compute From Three Clouds Is the Default Now

The companies with the most control over chip supply on the planet still rent across three cloud providers. That is the fact that should reset how a platform team thinks about AI infrastructure. If a frontier lab with custom silicon deals and over a million of its own accelerators cannot single-source compute, the 200-person team running model-serving in production has no business betting on one provider either.

Read the numbers from the lab itself. Anthropic states plainly that it runs Claude across three silicon families and three clouds at the same time: “We train and run Claude on a range of AI hardware — AWS Trainium, Google TPUs, and NVIDIA GPUs… Claude remains the only frontier AI model available to customers on all three of the world’s largest cloud platforms: AWS (Bedrock), Google Cloud (Vertex AI), and Microsoft Azure (Foundry).” That is from Anthropic’s own partnership announcement. They do not frame it as insurance. They frame it as matching workloads to the chips best suited for them, which buys better performance and more resilience.

The Money Says This Is the Baseline, Not a Side Bet

Hedging is small. What Anthropic is doing is not small.

On the AWS side, the commitment runs over $100 billion and up to 5 gigawatts across a ten-year span. More than a million Trainium2 chips are already training and serving Claude through Project Rainier, and AWS is named the primary training and cloud provider. That spans Graviton CPUs and the Trainium2-through-Trainium4 custom silicon line.

On the Azure side, Anthropic committed $30 billion in compute plus up to a gigawatt of NVIDIA Grace Blackwell and Vera Rubin capacity. In the same deal Microsoft and NVIDIA are investing $5 billion and $10 billion into Anthropic. And there is a multi-gigawatt Google and Broadcom TPU buildout coming online in 2027 on top of that.

Stack those up. Over $100 billion on AWS, $30 billion on Azure, multi-gigawatt on Google. A company does not spread that kind of capital across three vendors as a defensive crouch. It does it because that is what running serious AI workloads at scale actually requires. Anthropic’s run-rate revenue passed $30 billion this year, up from roughly $9 billion at the end of 2025. They are diversifying providers while they scale, not because anyone is forcing their hand.

The Silicon Layer Is Multi-Vendor Too

It is tempting to read “multi-cloud” as a billing decision — three vendors, three invoices, one abstraction over commodity GPUs underneath. That is not what is happening here. The diversification goes all the way down to the chip.

The hardware list is AWS Trainium2 through Trainium4 and Graviton, Google TPUs built with Broadcom, and NVIDIA Grace Blackwell and Vera Rubin. And the supplier set is still growing. Anthropic is now reportedly in talks to rent servers running on Microsoft-designed chips, with Azure usage rising since November 2025, per The Information. That is a fourth distinct silicon path entering the mix.

Different chips have different strengths for different parts of the workload. Trainium is cost-efficient for large training runs. TPUs have their own profile for certain matrix shapes. NVIDIA’s parts lead on raw flexibility and tooling maturity. Routing the right workload to the right silicon is an engineering decision with real performance and cost consequences, and it only works if your serving layer can target more than one backend.

What This Means for a 200-Person Platform Team

The lesson transfers directly, and it cuts against a posture you still hear in platform-engineering circles: pick one cloud, go deep, standardize everything on its managed services, and treat portability as premature optimization. For most of the stack, that posture is defensible. The managed database, the queue, the object store — going deep on one provider there saves real time.

The AI-serving layer is the exception, and the frontier labs just told you why. If the company with the most control over its own chip supply still cannot single-source compute or silicon, your model-serving layer cannot bet on a single backend either. The constraints that force diversification at the top — capacity availability, price per token, chip-to-workload fit, supply timing — show up at every scale below it. You will not get a million chips allocated, but you will hit GPU availability walls in a region, price changes on a managed inference endpoint, and a quota that does not move when you need it to.

So treat portability of the serving layer as an architecture requirement, the same way you treat authentication or observability as a requirement. Concretely, that means a few things. Keep an inference abstraction between your application code and any single provider’s SDK, so swapping the backend is a config change and not a rewrite. Avoid building hard dependencies on one vendor’s proprietary serving features unless you have a deliberate reason and an exit plan. Keep your model weights and serving stack in a form you can stand up on more than one provider’s accelerators. Run at least a smoke-test path on a second backend continuously, so “we could move” is a tested claim and not a hope.

This is not a call to run everything everywhere all the time. Multi-cloud as a blanket strategy is expensive and usually a mistake. The point is narrower and load-bearing: the inference path is the one place where single-provider lock-in is now a standing liability, because the supply dynamics above you guarantee you will eventually need to move some of it.

The Default Has Already Shifted

A year ago, spreading inference across providers and chip families read like something only the largest labs could justify. The receipts say it is now the operating baseline for anyone running frontier models — stated in the lab’s own words, backed by more than $130 billion in committed capacity across three clouds and four silicon paths.

When the baseline at the top of the market moves, the architecture expectations below it move with it. Single-cloud AI strategy used to be the safe default. It is now the position you have to justify. Build the serving layer so the backend is a choice you keep making, not a decision you made once and cannot revisit.