What changed in Anthropic's Claude subscription pricing?

Claude subscriptions now convert to dollar-matched API credits for all programmatic usage, including Agent SDK, claude-p, GitHub Actions, and third-party harnesses. This eliminates what had been a 70-90% effective discount on batch and eval workloads, so any pipeline budgeted against flat subscription cost is now charging at full API rates.

Why are single-turn eval harnesses inadequate now?

Vercel gateway telemetry shows 59% of token volume is agentic multi-turn tool-calling, while most harnesses still measure single-turn accuracy on the 41% minority. Agentic traces also run at roughly 15:1 input-to-output versus the historical 3:1, so cost models calibrated on single-turn assumptions can be off by about 5x. Track tool-call F1, steps-to-completion, and cost-per-successful-task instead.

Is Token Superposition Training worth piloting now?

Yes — it's the lowest-risk efficiency bet on the table. TST is a pretraining recipe change with no inference-side impact, claiming 2-3x wall-clock improvement at matched FLOPs, validated from 270M up to a 10B-A1B MoE. Even a 1.6x replication on a continued-pretraining spike pays for itself immediately, since the serving architecture is unchanged.

Does the OpenAI Codex switch promo justify a real evaluation?

It's an asymmetric bet worth taking if you have a provider abstraction layer. The 2-month-free Codex enterprise promo lets you run matched prompts against your Claude harness at zero marginal cost, and Ramp's April data already shows a genuinely multi-vendor market (34.4% Anthropic vs 32.3% OpenAI on card spend). Worst case you validate confidence in Claude; best case you find a cost-competitive alternative.

Edition 2026-05-22 · read as Data Science

AnthropicEndsClaudeSubscriptionSubsidy,EvalCostsSpike

Sources: 36
Words: 1,362
Read: 7min

Topics Agentic AI LLM Inference AI Capital

◆ The signal

Anthropic quietly metered Claude subscriptions to dollar-matched API credits, removing what had been a 70-90% effective subsidy on Agent SDK, GitHub Actions, and third-party harness calls. OpenAI announced a 2-month-free Codex enterprise switch promo the same day. The thing the pricing page doesn't tell you: any eval harness or batch pipeline budgeted against flat subscription cost is now charging at API rates, and the overrun shows up in this week's token burn, not next quarter's review.

Key facts

Anthropic converted Claude subscriptions to dollar-matched API credits, eliminating a 70-90% effective subsidy on Agent SDK, GitHub Actions, and third-party harness calls.
OpenAI launched a 2-month-free Codex enterprise switch promo on the same day as Anthropic's pricing change, with Ramp's April data showing Anthropic at 34.4% vs OpenAI's 32.3% B2B share.
Vercel AI Gateway telemetry from 200,000 teams shows 59% of token volume is now agentic, with Anthropic capturing 61% of spend and Google capturing 38% of volume.
Datology's VLM achieved +11.7 points across 20 benchmarks at 17x less training compute and 3.3x lower response FLOPs than Qwen3-VL-4B.
Anthropic is leasing xAI's entire Colossus 1 cluster of 220,000+ GPUs to handle 80x growth versus a planned 10x, while targeting an October IPO.

◆ INTELLIGENCE MAP

01
Anthropic Triple Pricing Shock + Capacity Crisis
act now
Anthropic metered subscriptions at API rates, tripled Opus 4.7 image cost, and announced a June 15 third-party tool credit split — all while admitting an 80x capacity miss that forced leasing xAI's 220K-GPU Colossus 1 cluster. ServiceNow burned its full-year Claude budget by May. Any Claude cost model built before this week is wrong.
80x
capacity plan miss
9
sources
- B2B share (Ramp)
- ARR growth (4mo)
- Colossus GPUs
- Image cost change
- June 15 credit split
1. Planned growth10
2. Actual growth80
3. Capacity gap8
02
59% Agentic Volume: Eval and Cost Models Obsolete
act now
Vercel's AI Gateway (200K teams, 7 months) reports 59% of tokens are now agentic multi-turn traffic. Anthropic captures 61% of spend via Opus; Google captures 38% of volume via Flash. Cost models built on 3:1 input-output ratios are off by ~5x; single-turn eval harnesses score the minority of production traffic.
59%
agentic token share
4
sources
- Anthropic spend share
- Google volume share
- I/O ratio (agentic)
- MCP token overhead
1. Agentic tokens59
2. Single-turn tokens41
03
Training Efficiency Breakthroughs: 2-360x Gains
monitor
Three independent results shift pretraining and distillation economics: Nous TST delivers 2-3x wall-clock speedup at matched FLOPs with no inference change (validated 270M→10B). NVIDIA Star Elastic produces model-size families at 360x less cost than pretraining each. Datology beats InternVL3.5-2B by 10 pts at 17x less compute via pure data curation.
17x
compute reduction (VLM)
2
sources
- TST speedup
- Star Elastic savings
- Datology bench gain
- Datology compute cut
1. Nous TST3
2. NVIDIA Star Elastic360
3. Datology curation17
04
Compute Supply Crunch: 4:1 Demand and Neocloud Boom
monitor
Nebius reported 684% YoY revenue growth with 4+ customers per GPU, guiding $3-3.4B for 2026 vs $530M in 2025. Cerebras IPO'd at $56B with a $20B OpenAI commitment. Cisco AI orders jumping from $5B to $9B with explicit memory-hardware shortage. H2 capacity priced on today's availability is likely mispriced.
4:1
GPU demand-to-supply
5
sources
- Nebius 2026 guide
- Nebius YoY growth
- Cerebras valuation
- Cisco AI order jump
1. Nebius 2025530
2. Nebius 2026 guide3200
05
AI Cyber Capability: AISI Ranges Saturating
background
Anthropic's Mythos cleared both AISI attack ranges (first model ever). GPT-5.5-cyber cleared one. Both achieved 'full network takeover' in controlled environments — a step-function above prior gen's 'advanced persistence.' AISI is already building harder tests. Google confirmed a threat actor using AI to build cybercrime tooling in the wild.
2/2
AISI ranges cleared
4
sources
- Mythos ranges
- GPT-5.5 ranges
- Prior gen ceiling
- Palo Alto scan
1. 01Mythos (new)Full takeover
2. 02GPT-5.5-cyberPartial takeover
3. 03Mythos (prior)Adv. persistence

◆ DEEP DIVES

Anthropic's Pricing Earthquake: Three Simultaneous Cost Shocks Hit Your Claude Stack

What Happened

The headline change: Claude subscriptions now convert to dollar-matched API credits for all programmatic usage, including Agent SDK, claude-p, GitHub Actions, and third-party harnesses. The implicit 70-90% effective discount is gone. In the same week, Opus 4.7 tripled image processing cost, and starting June 15, third-party tool usage (Zed, Conductor, OpenCode, T3 Code) moves to a separate credit bucket equal to plan value with no rollover and overflow at API rates.

The driver behind the pricing is capacity. Anthropic planned for 10x growth and is seeing 80x. The emergency fix is leasing xAI's entire Colossus 1 cluster, 220,000+ GPUs spanning H100, H200, and GB200. A CFO is in seat and the company is targeting an October IPO, which is a reasonable proxy for why margin-per-token is now a board metric.

The Capacity Context

The pricing changes read more cleanly alongside the capacity numbers. ServiceNow's CDIO burned through a full-year Claude budget by May. National Life Group's CIO called Claude 'not great for companies that want per-user monitoring,' and Anthropic ships no native per-user telemetry and no SLAs, which is unusual for a dependency sitting on production critical paths.

Change	Impact	Timeline
Subscription → API credits	70-90% discount gone on batch/eval workloads	Immediate
Opus 4.7 image cost	3x on multimodal pipelines	Immediate
June 15 third-party split	No subsidized tokens for Zed/OpenCode/T3	30 days
Colossus integration	p95/p99 variance during heterogeneous fleet merge	Weeks–months

Any Claude benchmark from before May 7 is stale, and any cost model built against flat subscription rates is not directionally wrong, it is numerically wrong.

The OpenAI Counter-Move

Sam Altman posted a 2-month-free Codex enterprise switch promo on the same day Anthropic metered subscriptions. Ramp's April data showed Anthropic edging OpenAI for the first time, 34.4% vs 32.3%. The promo is an asymmetric-payoff bet: free to evaluate, with bounded switching cost if you already have a provider abstraction layer. OpenAI is pricing directly into the developer cohort Anthropic just alienated.

Cross-Source Tension

Sources disagree on the durability of Anthropic's lead. Ramp data is a card-spend proxy and measures who gets billed, not token volume or production criticality. A 20-seat pilot weighs the same as a company at inference scale, and OpenAI notes large enterprises rarely pay by card. The directional signal looks real; the magnitude does not. What is not uncertain is that the market is now genuinely multi-vendor, and architecture should reflect that.

Action items

Audit every Claude-backed workload (Agent SDK, GitHub Actions, batch evals) and reconcile projected token burn against the new credit cap by end of this week
Deploy an LLM gateway (LiteLLM, Portkey) with per-user, per-feature tagging and daily budget alerts within this sprint
Run a 2-month Codex evaluation under OpenAI's enterprise switch promo with matched prompts against your Claude harness
Reforecast Claude inference spend for any team using Zed/Conductor/OpenCode modeling the post-June-15 scenario

Sources:Claude just metered your agent SDK calls · Claude Code latency on long-context requests drifted upward · Anthropic ships no per-user usage telemetry · Anthropic passes OpenAI in B2B · Vercel published a number worth sitting with · Ramp's AI Index shows Anthropic at 34.4%

59% Agentic Volume: Your Eval Harness and Cost Model Are Measuring the Minority

The Production Shift, Quantified

Vercel's AI Gateway telemetry — 200,000 teams over 7 months — reports that 59% of all token volume is now agentic: multi-turn, tool-calling traces, not single-shot completions. Six months ago the share was under 20%. The interesting structure is the spend-volume split. Anthropic captures 61% of spend via Opus on reasoning and planning nodes, while Google captures 38% of volume via Flash on throughput and utility calls. Vendor loyalty is not visible in the data. Customers churn freely.

This is not a leaderboard result. It is production telemetry from a multi-tenant gateway, which means most tokens in the wild are inside multi-step tool loops with retries, not the single-turn completions an eval harness was built to score.

What Breaks at 59%

Eval harnesses: single-turn accuracy on held-out prompts measures the 41% minority. The thing this doesn't tell you is whether a planner burns 40K tokens arguing with itself before giving up. Pass rate looks fine. Spend is 5x budget. Cost models: most were calibrated when input-output ratios sat at 3:1. Agentic traces run at ~15:1 on input, with heavy cache reuse on some providers and none on others. A forecast carried over from last year's ratio is off by roughly 5x on spend.

Metric	Single-Turn World	59% Agentic World
Input:Output ratio	~3:1	~15:1
Cost driver	Output tokens	Trajectory length × retries
Eval metric	Accuracy on final answer	Cost-per-successful-task
Routing unit	Single request	Session with KV cache state
Failure mode	Wrong answer	Correct answer at 10x budget

If 59% of your tokens are agentic but 100% of your evals are single-turn, you're flying instruments-out — update the harness before you update the model.

The Routing Architecture That Fits

The Vercel pattern is consistent with what Abridge disclosed across 80M+ clinical conversations: a constellation of models, cheap fast triage in front, expensive reasoning behind, per-task selection. Given those numbers, the plausible envelope for cost reduction from tiered routing is 20-40% at constant trajectory completion rate. Glean's published benchmark puts off-the-shelf MCP at 30% more tokens than retrieval-tuned knowledge graphs. The methodology is not disclosed and the source is the vendor, so treat it as directional. It is consistent with the MCP context-window bloat people have measured independently.

The Enterprise Convergence

SAP (€100M partner investment) and ServiceNow (Action Fabric) have landed in the same place: agents need Knowledge Graph grounding + MCP-exposed workflows. The architectural drift is from RAG-over-docs to structured, entity-resolved context. The eval needs to move with it. Tool-use accuracy against grounded context is the production differentiator. It is on no public leaderboard.

Action items

Add trajectory-level metrics to your eval harness this sprint: tool-call F1, steps-to-completion, cost-per-successful-task, recovery-from-error rate
Instrument per-node token cost across your agent graphs and route utility calls (summarization, extraction, query rewriting) to Flash/Haiku-class models within 2 weeks
Run a 1-hour spike measuring token overhead of current MCP/tool-calling setup vs. a retrieval-first baseline on 100 production traces
Add LLM-judge ↔ human-annotator agreement as a tracked SLI this quarter; re-calibrate when judge model changes

Sources:Agentic traffic crossed fifty-nine percent · Vercel published a number worth sitting with · AI Gateway data puts agentic workloads at fifty-nine percent · Abridge runs model routing across 100M conversations · MCP plus knowledge graphs

Training Efficiency Frontier: Three Results That Change Unit Economics This Quarter

Three Independent Wins, One Direction

The marginal dollar in model development has moved from raw compute to architecture and curation. The evidence this week comes from three separate labs, each working a different layer of the training stack, and the results point the same way. The marginal dollar in model development has moved from raw compute to architecture and curation.

Work	Claim	Scale Validated	Inference Impact	Replication Risk
Nous Research TST	2-3x wall-clock at matched FLOPs	270M → 10B-A1B MoE	None — no architecture change	Medium; single-source, clean claim
NVIDIA Star Elastic	360x cheaper model-family derivation; 7x vs SOTA compression	Not specified	Produces family of sizes from one run	High; big number, lab-reported
Datology VLM	+11.7 pts on 20 benchmarks at 17x less compute	2B and 4B params	3.3x lower response FLOPs (serving win)	Medium; benchmark selection risk

TST Is the One to Spike First

Token Superposition Training is a pretraining recipe change with no inference-side downstream. If it replicates, it is a 2-3x improvement on wall-clock for free. The serving architecture is unchanged. The efficiency lives entirely in how gradients are computed during training. Validated from 270M up to a 10B-A1B MoE, which is the band most teams actually run for continued pretraining and domain adaptation.

Datology: Curation Beats Compute

Datology's result is the cleanest evidence so far that data curation now dominates compute scaling for VLMs. A 2B model lands about 10 points above InternVL3.5-2B at 17x less training compute, on data selection alone. At 4B, they reach near-frontier quality at 3.3x lower response FLOPs than Qwen3-VL-4B, which is a serving win and not just a training one. For any team maintaining a vision pipeline, this reorders the next budget line. Curation tooling before more GPUs.

Star Elastic: Speculative but High-Leverage

NVIDIA's claim that one post-training run produces a full family of model sizes at 360x lower cost than pretraining each is the kind of number that always shrinks under independent evaluation. The thing this number doesn't tell you is what holds at the small end versus the large end. Even a 30x hold would restructure how teams produce size tiers for routing. Paired with the 59% agentic routing pattern, cheap access to a 1B-to-70B spectrum from a single run is what a tiered architecture actually needs.

TST is worth a spike on your next continued-pretraining run. If wall-clock comes in at even 1.6x with no val-loss regression, it pays for itself immediately.

The Parallel Signal: Only 15% Are Ready

Fivetran's readiness index reports that 15% of organizations have the data foundation for agentic AI. Data quality and lineage are the top blocker, cited by roughly 50%. Read against the efficiency results, the picture is consistent. Training is getting cheaper. Most teams cannot capitalize because the data layer is not there. Half the agent projects funded this quarter are data-platform projects with an agent bolted on.

Action items

Spike Token Superposition Training on a 1B-param continued-pretraining run against a matched-FLOPs baseline within the next 2 weeks
Run an ablation benchmarking Qwen-2.5/3 or DeepSeek-V3 against your current production model on in-domain evals this quarter
Audit ANALYZE/compute-stats coverage across top-20 Iceberg/Delta tables; add stats freshness to table-level SLAs
Score target domains against Fivetran readiness dimensions (quality, lineage, governance) before greenlighting agentic AI projects

Sources:Claude just metered your agent SDK calls · DuckDB shipped a client-server mode this week

◆ QUICK HITS

DuckDB shipped Quack HTTP client-server protocol — credible replacement for Spark-on-Glue for sub-100GB jobs; default config is localhost-only, no SSL, token auth
DuckDB shipped a client-server mode this week
Kafka Share Groups report ~linear 8x throughput at 32 consumers on I/O-bound workloads — decouple consumer parallelism from partition count for embedding/enrichment pipelines
DuckDB shipped a client-server mode this week
Update: Mythos cleared both AISI attack ranges (first model ever) — capability stepped from 'advanced persistence' to 'full network takeover'; AISI is building harder tests because current ladder is saturating
Mythos cleared the AISI attack ranges this week
Apache Iceberg CVE-2026-42812 (CVSS 9.9): attacker with table-write permission can redirect metadata to attacker-controlled S3, poisoning training data on next read — audit Polaris catalog write-path allowlisting
LiteLLM landed in the KEV catalog this week
Gemini reproducibly outputs real phone numbers from training data — three independent extraction cases; add PII canary + divergence-attack probes to LLM CI before next release
Gemini is the latest model to surface PII
Duolingo publicly pegs AI-generated content rejection at ~20% requiring human QC — use as calibration anchor for your own generation pipeline acceptance rate
Duolingo's twenty percent AI slop rate
TML-Interaction-Small hits 0.40s turn-taking latency vs 0.57s Gemini Flash and 1.18s GPT-Realtime — full-duplex voice becoming its own model class with 200ms micro-turn architecture
TML is reporting 0.40 seconds of full-duplex latency
SWE-ZERO-12M-trajectories released: 112B tokens, 12M trajectories, 122K PRs across 3K repos and 16 languages — largest open agentic trace corpus; pull before licensing frictions arrive
Claude just metered your agent SDK calls
Microsoft's agent memory architecture (consolidation + forgetting + delayed maturation) stabilizes at 400-500 memories with 97.2% retention precision — alternative to bloated prompts or flat vector top-k
DuckDB shipped a client-server mode this week
Persona drift measurable within 8 dialogue turns per Li et al. (COLM 2024) — embed a verbal tic as a free drift canary via regex; costs one hour to instrument
AI personas drift within eight turns

◆ Bottom line

The take.

Anthropic killed the flat-rate developer discount, tripled image costs, and announced a June 15 credit split — all while 59% of production tokens are now agentic and your eval harness still measures single-turn completions. The two cheapest things you can do this week: reconcile every Claude workload against the new metered credit cap, and add trajectory-level cost-per-successful-task to the eval harness that's currently scoring the minority of your traffic.

Frequently asked

What changed in Anthropic's Claude subscription pricing?: Claude subscriptions now convert to dollar-matched API credits for all programmatic usage, including Agent SDK, claude-p, GitHub Actions, and third-party harnesses. This eliminates what had been a 70-90% effective discount on batch and eval workloads, so any pipeline budgeted against flat subscription cost is now charging at full API rates.
How should I reforecast Claude spend before the June 15 third-party tool change?: Model the post-June-15 scenario for any team using Zed, Conductor, OpenCode, or T3 Code, since third-party tool usage moves to a separate credit bucket equal to plan value with no rollover and overflow at API rates. Combine this with a per-workload audit of Agent SDK and batch eval token burn to catch silent overruns before credits exhaust mid-month.
Why are single-turn eval harnesses inadequate now?: Vercel gateway telemetry shows 59% of token volume is agentic multi-turn tool-calling, while most harnesses still measure single-turn accuracy on the 41% minority. Agentic traces also run at roughly 15:1 input-to-output versus the historical 3:1, so cost models calibrated on single-turn assumptions can be off by about 5x. Track tool-call F1, steps-to-completion, and cost-per-successful-task instead.
Is Token Superposition Training worth piloting now?: Yes — it's the lowest-risk efficiency bet on the table. TST is a pretraining recipe change with no inference-side impact, claiming 2-3x wall-clock improvement at matched FLOPs, validated from 270M up to a 10B-A1B MoE. Even a 1.6x replication on a continued-pretraining spike pays for itself immediately, since the serving architecture is unchanged.
Does the OpenAI Codex switch promo justify a real evaluation?: It's an asymmetric bet worth taking if you have a provider abstraction layer. The 2-month-free Codex enterprise promo lets you run matched prompts against your Claude harness at zero marginal cost, and Ramp's April data already shows a genuinely multi-vendor market (34.4% Anthropic vs 32.3% OpenAI on card spend). Worst case you validate confidence in Claude; best case you find a cost-competitive alternative.

◆ Same day, different angle

Read this day as…

◆ Recent in data science

AnthropicEndsClaudeSubscriptionSubsidy,EvalCostsSpike

◆ INTELLIGENCE MAP

◆ DEEP DIVES

What Happened

The Capacity Context

The OpenAI Counter-Move

Cross-Source Tension

The Production Shift, Quantified

What Breaks at 59%

The Routing Architecture That Fits

The Enterprise Convergence

Three Independent Wins, One Direction

TST Is the One to Spike First

Datology: Curation Beats Compute

Star Elastic: Speculative but High-Leverage

The Parallel Signal: Only 15% Are Ready

◆ QUICK HITS

The take.

Frequently asked

◆ RELATED THREADS