Saturday, June 6, 2026 ~4 min

The cost of being a single-vendor shop just got named

Anthropic killed the third-party Claude subsidy with a June 15 deadline, admitted an 80x capacity miss, and is now running inference on xAI's hardware. Re-cost your stack before the next invoice does it for you.

Anthropic converted Claude subscriptions to dollar-matched API credits this week. Starting June 15, Claude usage through Cursor, Cline, Zed, OpenCode, Conductor, and every other third-party harness draws from a separate credit pool sized to your plan's dollar value, then bills at full API rates. The 70-90% effective discount that subsidized power users for the last year is gone. Same prompts, same outputs, new bill — 3-10x higher for any team that adopted Claude through a non-Anthropic harness.

In the same week, Dario Amodei admitted Anthropic planned for 10x growth and got 80x. That's the capacity miss that quietly degraded Claude Code through April — features nerfed without changelog entries, corporate accounts banned without warning, latency drift on long-context requests. The fix is a lease on xAI's entire Colossus 1 cluster, 220,000+ GPUs, from a CEO who has publicly called Anthropic "misanthropic and evil." Inference for the #1 enterprise AI vendor now transits infrastructure owned by a competitor with no contractual reason to keep it transiting.

ServiceNow burned its full-year Anthropic budget by May and discovered Anthropic ships no per-user telemetry, no cost attribution, no SLAs. National Life Group's CIO put it on the record: great for consumers, not great for companies. This is the vendor Ramp's data shows leading enterprise B2B at 34.4% to OpenAI's 32.3% — the first crossover, and the largest AI vendor most enterprise DPAs don't cover correctly.

Three things stopped being optional this week

The pricing change, the capacity admission, and the Colossus lease are not three stories. They're one story about what happens when a foundation model vendor optimizes for an October IPO.

First, the cost model. If your developers use Claude through any third-party tool, your per-developer assumption is off by an order of magnitude in 30 days. OpenAI is offering two months of free Codex to enterprise teams that switch within 30 days — expires July 13. Whether or not you switch, running that benchmark generates the comparison data you'll need at the next renewal. The data is free. The decision is yours.

Second, the telemetry gap. ServiceNow's blowup happened because nobody was watching the meter. The fix is an LLM gateway — LiteLLM, Portkey, or your own — with per-user, per-feature tagging and daily budget alerts. Caveat: LiteLLM hit CISA's KEV catalog this week with active exploitation in the wild. If you're running 1.81.16 through 1.83.7, upgrade tonight and rotate every provider API key it stored. The tool you need to instrument cost is the same tool that's currently being weaponized.

Third, vendor concentration. An 80x capacity miss with no SLA is a quantified outage risk. The architecture answer is multi-provider routing behind an abstraction that can fail over mid-run without losing agent context. Vercel's production data — 200,000 teams, seven months — already shows this is standard practice: Anthropic captures 61% of spend on Opus reasoning, Google captures 38% of volume on Flash throughput. Single-vendor pitches are pricing a world that ended sometime last year.

The eval harness is measuring the wrong workload

The same Vercel data buries a number worth sitting with: 59% of token volume is now agentic — multi-turn, tool-calling, stateful between requests. Most eval harnesses still score single-turn responses against reference answers. That was correct in 2023 and measures the minority of 2026 production traffic. Cost models built on 3:1 input-output ratios are off by roughly 5x against agentic traces that run closer to 15:1.

The Mozilla-versus-curl comparison from this week is the cleanest evidence of what actually drives outcomes. Same Claude model family. Mozilla's custom harness — fuzzer-integrated, ephemeral VMs, sanitizer-truth signal — surfaced 271 Firefox bugs including sandbox escapes. Stenberg ran the same model as a generic scanner against curl and got one real CVE out of five claimed. A 271:1 yield ratio on the same weights. The harness is the product. Teams debating Claude versus GPT versus Gemini are optimizing the variable that matters least.

The operational consequence: trajectory-level metrics belong in your eval stack this sprint. Task success, tool-call precision and recall, steps-to-completion, cost-per-successful-task. If 59% of your tokens are agentic and 100% of your evals are single-turn, you are flying instruments-out and the variance you can't see is the variance that pages you.

What to actually do this week

Audit every Claude-backed workload — Agent SDK, GitHub Actions, batch evals, any developer tool routing through a third party — and reconcile projected token burn against the new credit cap before next Friday. Stand up an LLM gateway with per-user tagging and budget alerts in the same sprint. Qualify at least one non-Anthropic provider for your top three internal AI workloads and document the migration path. Run the OpenAI Codex benchmark on representative work before July 13.

And while you have the patch window open: Traefik shipped a CVSS 10.0 auth bypass this week, NGINX disclosed an 18-year-old pre-auth RCE in the rewrite module, Argo CD lets any authenticated user read plaintext Kubernetes Secrets, and PraisonAI was weaponized four hours after disclosure. The June 15 invoice will arrive whether or not your perimeter is patched. Patch the perimeter first.

◆ Behind the synthesis

Six specialist takes that fed this piece.

The piece above is one stream in my voice. Below are the six lenses my pipeline produced upstream — each tuned for a different reader. Use them when you want the angle that matters most to your role.

The cost of being a single-vendor shop just got named

Three things stopped being optional this week

The eval harness is measuring the wrong workload

What to actually do this week

Six specialist takes that fed this piece.

The NGINX rewrite module carries an 18-year-old pre-auth RCE disclosed today.

Anthropic ended the flat-rate Claude subsidy this week.

Anthropic eliminates the 70-90% implicit discount on third-party Claude tool usage starting June 15 — and OpenAI is offering 2 months free Codex to enterprise teams who switch within 30 days.

Anthropic's Mythos cleared both UK AISI simulated attack ranges this week, a first, while TrustedSec demonstrated that all five major commercial EDR products share architectures an AI reverse-engineers in days rather than weeks.

Anthropic edged OpenAI in enterprise billing on Ramp last week, 34.4 percent to 32.3, in the same week ServiceNow admitted it had burned its entire annual Claude budget by May.