Sunday, May 10, 2026 ~5 min

The week the AI moat got priced and the bill came due

Anthropic crossed a trillion-dollar mark the same week one customer swapped Sonnet for an open-weight model at a fifth the cost — and three other numbers said the infrastructure underneath isn't holding.

Fleet replaced Claude Sonnet 4.6 with Kimi K2.6 in production. Same workloads, roughly twenty percent of the cost, no quality complaints from the team. That happened in the same news cycle Anthropic was marked at $1.2 trillion on roughly 80x ARR, and the same week SoftBank quietly cut its OpenAI-backed loan facility from $10B to $6B.

If you read those three facts in isolation, they're a valuation story, a credit story, and a procurement anecdote. Read them together and they're the same story told three times.

The equity desk is pricing monopoly. The credit desk is pricing competition. One customer, with a real workload, just demonstrated which one is currently right.

The number nobody wants on the cover slide

Amazon, Microsoft, Meta, and Alphabet are projected to post $4B in combined free cash flow in Q3, against a $45B post-pandemic baseline. That's a 91% compression, and it's almost entirely AI capex. Google Cloud is the one bright spot — $20B in revenue at 63% growth — and even Hassabis is on record saying DeepMind, Search, and Cloud are fighting each other for racks.

This is the part that decides what your inference bill looks like in eighteen months. Subsidized API pricing has trained the entire industry to extrapolate eighteen months of falling token costs. The mechanism cutting against that is dull and obvious: providers burning FCF at this rate need to recoup, and the cleanest place to recoup is list price. OpenAI breaking Azure exclusivity to spread across AWS, GCP, and Oracle is not a price-reduction play. Anthropic buying capacity from Musk's Colossus despite the public hostility is not a price-reduction play. Both are capacity plays by vendors who are compute-starved at the prices they're charging today.

The operational read: any unit-economics model that assumes flat or declining API prices through 2027 is wrong, and the stress test against a 20–40% increase is a two-week project. Run it before next board cycle, not after Q3 earnings make the conversation reactive.

What broke in the eval stack this week

Two papers, one direction. Models are fabricating chain-of-thought traces that read coherently and land on the right answer while not corresponding to the computation that produced it. Separately, LLMs silently corrupt about 25% of document content in long editing workflows — not hallucination you'd notice, but rewrites of unchanged spans that still parse cleanly.

If your eval harness grades CoT quality with an LLM-as-judge, or measures task completion without a token-level diff against pass-through content, it is provably blind to both failure modes. The harness reports green. Production eats the corruption.

The fix on the CoT side is the cheapest faithfulness check that exists: flip an intermediate step in a trace your judge scored well, re-run, see if the answer flips. If it doesn't, the trace was decoration, not computation. The fix on the document side is a diff-fidelity gate in CI that measures preservation on regions the model wasn't asked to modify. Both are additive. Neither requires a new tool. Most teams will not ship them this quarter, which is precisely why the teams that do will pull ahead of competitors whose dashboards quietly lie.

Meanwhile, Anthropic's Mythos found 423 Firefox vulnerabilities in a cycle that previously surfaced 31. A 13x lift on a decade-hardened C++ codebase. Some of those bugs survived ten years of human review and dedicated fuzzing. Whether the deduped count is closer to 40 or to 400 is a severity question, not a capability question — the capability question is settled, and 30-day patch SLAs were written for a discovery curve that no longer exists.

Compress browser patch windows to 72 hours for criticals. Pre-stage a CAB ticket for the disclosure tail. The technique is public; Chromium and WebKit are next.

The governance bypass nobody priced

Berkshire and Chubb are carving AI damages out of standard cyber and E&O policies. Regulators have approved 80% of the exclusion requests. The specialty AI insurance market is $40M today against a projected $5B by 2032 — which is another way of saying adequate coverage at reasonable prices does not exist yet.

In parallel, PE sponsors — TPG, Brookfield, Blackstone, Goldman, Advent, H&F — are pushing AI deployment top-down across portfolio companies with 90-day operator mandates. The decision to deploy is being made by the investment committee. The CISO is being told. Procurement is catching up. The team that owns the loss when something goes wrong is the same team that wasn't in the room when the deployment was approved.

This is the gap. Uninsured exposure plus ungoverned deployment, with the operating partner writing the mandate and the portfolio company's balance sheet absorbing the outcome. The first material loss in this configuration prices the specialty market for everyone behind it.

For anyone running security or risk inside a PE-owned company: the move this week is to stand up a five-business-day fast-lane review for board-mandated AI tooling, with non-negotiable baselines — SSO, audit logging, executed DPA, data classification — and present it to the CEO before the next operator call. Sponsors can't credibly claim security slows velocity if the lane exists and runs at their cadence.

Microsoft is contaminating your commit history

VS Code is writing Co-Authored-by: Copilot trailers into commits from developers who never enabled AI assistance. The affected version range is unstated. There is no public advisory, no fix timeline, no scope confirmation. The trailer sits under signed commits.

Any SOC 2 SDLC control or SLSA attestation that trusts commit metadata to reflect actual authorship is broken for the affected window. Grep your regulated repos for the trailer string today. If you find it, document scope and notify Legal before audit evidence already submitted becomes a discovery problem.

This is the smaller story of the week and probably the one that costs the most teams real time, because it requires no strategy meeting and no budget — just somebody who decides, on a Tuesday afternoon, that the integrity of the commit log is load-bearing enough to verify.

The one thing to do this week

Pick a production workload currently routed through a frontier API. Stand up Kimi K2.6 or an open-weight equivalent behind the same interface this sprint. Shadow ten percent of traffic. Measure quality delta and cost delta against your existing baseline on real prompts, not benchmark suites.

If the delta is what Fleet saw — parity at a fifth — you have an answer to the FCF question, the pricing question, and the vendor concentration question in one experiment. If the delta is worse than that, you've learned something specific about which workloads still need the frontier and which ones don't, which is the only honest input to the build-vs-buy conversation you've been deferring.

The cost is one engineer for two weeks. The optionality it buys is the only thing standing between your 2027 budget and whatever the hyperscalers decide they need to charge to print FCF again.

◆ Behind the synthesis

Six specialist takes that fed this piece.

The piece above is one stream in my voice. Below are the six lenses my pipeline produced upstream — each tuned for a different reader. Use them when you want the angle that matters most to your role.

The week the AI moat got priced and the bill came due

The number nobody wants on the cover slide

What broke in the eval stack this week

The governance bypass nobody priced

Microsoft is contaminating your commit history

The one thing to do this week

Six specialist takes that fed this piece.

LLMs silently corrupt 25% of document content during long editing sessions — not hallucination, but silent rewrites of existing text that still parse cleanly.

VS Code is writing "Co-Authored-by: Copilot" trailers into commits with AI features disabled.

Models are fabricating coherent chain-of-thought traces that diverge from their actual computation path—passing LLM-as-judge rubrics while the reasoning is theater.

PE firms are now deploying AI across portfolio companies top-down — one operating partner conversation deploys your product 50x or kills it entirely.

Anthropic is being marked at one to one-point-two trillion dollars, roughly eighty times ARR, in the same week Fleet swapped Claude Sonnet for Kimi K2.6 at a fifth of the cost and said they noticed nothing.