What exactly changes for Claude billing on June 15, and which workloads are affected?

Subscription-based Claude usage through third-party tools (Conductor, Zed, OpenCode, T3 Code) and programmatic surfaces like Agent SDK, claude-p, and GitHub Actions converts to a separate credit bucket equal to plan value, with overflow billed at metered API rates. There are no rollovers and no subsidized tokens, which removes the 70-90% effective discount Max-plan power users were extracting. Any cost model assuming flat-rate consumption needs to be re-run at API pricing before that date.

Why are single-turn eval harnesses inadequate when 59% of tokens are agentic?

Single-turn harnesses score one response against a reference answer, but the median production request is now a multi-step tool loop with retries, planning, and cache reuse. Failure modes that matter — a planner burning 40,000 tokens arguing with itself, tool-call precision collapse, or runaway step counts — are invisible to single-shot scoring. Trajectory-level metrics (task success, tool-call precision/recall, steps-to-completion, cost-per-successful-task) are required to measure what's actually shipping.

How should I rebalance training spend given the Datology and TST results?

For VLM work, shift the next budget iteration toward roughly 60/40 curation-to-compute, using Datology's 17x compute reduction at 2B-4B as the prior that data quality now dominates scale below 10B parameters. For text pretraining, spike Token Superposition Training on a 1B continued-pretraining run against a matched-FLOPs baseline; even a partial replication of the 2-3x wall-clock claim pays back on the next full run with no inference-side change.

Why does the Mozilla 271:1 result mean model choice is the wrong optimization?

The same Claude Mythos Preview weights surfaced 271 real Firefox bugs under Mozilla's custom agentic harness (fuzzer-integrated, ephemeral VMs, sanitizer-grounded truth) and exactly one low-severity curl CVE under an out-of-box scan. A week of domain-specific harness engineering — reproducible test cases, ephemeral execution, integration with existing signal pipelines — yields roughly 50x more signal than swapping frontier models. Teams A/B-testing Claude vs GPT vs Gemini before investing in harness design are optimizing the smaller variable.

What's the minimum instrumentation needed before the next Anthropic invoice lands?

Deploy an LLM gateway like LiteLLM or Portkey with per-user and per-feature tagging plus daily budget alerts, because Anthropic provides no native cost attribution, no per-user telemetry, and no budget alerts. Add a second frontier provider behind a router abstraction with automatic failover on 429/5xx, since the documented 8x capacity-plan miss quantifies single-provider risk. Then re-baseline Claude Code and Opus benchmarks after the post-Colossus capacity changes — pre-May numbers are contaminated and will misattribute capacity noise to prompt or model changes.

Edition 2026-06-06 · read as Data Science

AnthropicEndsClaudeSubsidyasAgentTokensHit59%

Sources: 36
Words: 1,717
Read: 9min

Topics Agentic AI LLM Inference AI Capital

◆ The signal

Anthropic ended the flat-rate Claude subsidy this week. Programmatic calls now bill at metered API rates, in the same week Vercel's production telemetry put 59% of inference tokens inside agentic multi-turn traces rather than single-shot completions. The thing the old subscription price didn't measure was workload shape, and the workload shape moved. Any Claude-backed agent workflow still costed on subscription economics needs to be re-run against metered rates before June 15. Skipping that exercise is a pricing decision, just not a deliberate one.

Key facts

Anthropic ended its flat-rate Claude subsidy and now bills programmatic calls at metered API rates, with a hard cutover for third-party tools on June 15, 2026.
Vercel's AI Gateway production telemetry across 200,000 teams shows agentic multi-turn traces account for 59% of all inference tokens, with Anthropic taking 61% of spend and Google's Flash taking 38% of volume.
Anthropic is leasing xAI's entire Colossus 1 cluster of 220,000+ H100, H200, and GB200 GPUs and is targeting an October 2026 IPO after Dario Amodei reported 80x revenue and usage growth against a 10x plan.
Mozilla's custom agentic harness surfaced 271 Firefox bugs using Claude Mythos Preview, while the same model run as a generic scanner on curl yielded only 1 confirmed low-severity CVE — a 271:1 yield ratio driven by harness design.
Anthropic's Mythos became the first model to clear both AISI attack ranges by completing full network takeover, and Google confirmed the first in-the-wild threat actor using AI to build cybercrime tooling.

◆ INTELLIGENCE MAP

01
Anthropic's Triple Shock: Metered Credits, 80x Capacity Miss, Market Lead
act now
Anthropic converted subscriptions to dollar-matched API credits (killing 70-90% effective discounts), admitted 80x growth vs 10x planned capacity, and leased xAI's entire 220K-GPU Colossus 1 cluster. ServiceNow burned its full-year Claude budget by May. Ramp shows Anthropic at 34.4% vs OpenAI 32.3% — first crossover.
80x
growth vs capacity plan
12
sources
- Ramp share: Anthropic
- Ramp share: OpenAI
- Colossus GPUs leased
- Effective discount lost
1. Anthropic B2B share34.4
2. OpenAI B2B share32.3
02
59% Agentic Token Share Breaks Eval & Cost Models
act now
Vercel's AI Gateway production index shows 59% of tokens are multi-turn, tool-calling agentic traces. Anthropic captures 61% of spend (Opus for reasoning), Google captures 38% of volume (Flash for throughput). Single-turn eval harnesses now measure the minority of production traffic. Cost models built on 3:1 I/O ratios are off by ~5x.
59%
agentic token share
5
sources
- Agentic token share
- Anthropic spend share
- Google volume share
- I/O ratio (agentic)
1. Agentic (multi-turn)59
2. Single-shot41
03
AI Cyber Capability Clears Autonomous-Exploit Threshold
monitor
Anthropic's Mythos is the first model to clear both AISI simulated attack ranges (full network takeover). Mozilla's custom harness yielded 271 Firefox bugs vs. curl's 1 CVE with same model — a 271:1 harness-engineering delta. Google confirmed a threat actor using AI to build cybercrime tooling. MDASH shipped 16 real Windows patches from multi-model bug hunting.
271:1
harness yield ratio
7
sources
- Mozilla bugs found
- curl bugs found
- MDASH Windows fixes
- AISI ranges cleared
1. Mozilla (custom harness)271
2. curl (generic scan)1
3. MDASH (multi-model)16
04
Training Efficiency Breakthroughs: 2x to 360x Cost Cuts
monitor
Three research drops change pre-training and distillation economics. Nous TST: 2-3x wall-clock speedup at matched FLOPs with no inference architecture change (validated 270M→10B). Datology: +11.7 pts on VLM benchmarks at 17x less training compute via pure data curation. NVIDIA Star Elastic: one post-training run produces a model family at 360x lower cost.
17x
compute reduction (VLM)
2
sources
- TST speedup
- Datology compute cut
- Star Elastic savings
- Datology benchmark lift
1. Nous TST3
2. Datology curation17
3. Star Elastic360
05
Compute Crunch Quantified: 4:1 Demand, Siting Backlash, Silicon Diversification
background
Nebius reports 4+ customers per GPU brought online, 684% YoY revenue growth, guiding $3-3.4B for 2026. Cisco AI orders jumping $5B→$9B. Cerebras IPO'd at $56B with OpenAI's $20B commitment. The 9GW Stratos project faces 4,000 complaints and a referendum. Inference hardware is diversifying but supply remains structurally tight.
4:1
GPU demand/supply ratio
5
sources
- Nebius YoY growth
- Cerebras valuation
- Cisco AI order growth
- Stratos complaints
1. Nebius 2025 revenue530
2. Nebius 2026 guide3200

◆ DEEP DIVES

Anthropic's Pricing Reset: Your Claude Cost Model Broke Three Ways This Week

The Convergence

Anthropic is leasing xAI's entire Colossus 1 cluster, 220,000+ GPUs spanning H100, H200, and GB200, and targeting an October IPO. That is the context for the pricing change underneath it. Claude subscriptions now convert to dollar-matched API credits across Agent SDK, claude-p, GitHub Actions, and third-party harnesses, which removes the 70-90% effective discount power users extracted from Max plans. Dario Amodei admitted planning for 10x growth and hitting 80x in revenue and usage, which is why Claude Code degraded through April. It was a capacity miss, not a product decision, and the capacity fix is what the Colossus lease pays for. Production routing decisions should be made against this combined picture, not any single fact in it.

ServiceNow's CDIO already burned the full-year Claude budget by May. National Life Group's CIO called Claude 'great for consumer usage but not great for companies' that want per-user monitoring. Anthropic provides no native per-user telemetry, no SLAs on latency or availability, and no budget alerts.

Why Sources Disagree

Ramp's data shows Anthropic at 34.4% vs OpenAI at 32.3% of paying businesses, the first crossover. OpenAI's objection is correct on its own terms: Ramp measures credit-card spend, not invoice-based enterprise contracts. The crossover is real for bottoms-up developer adoption. It likely overstates Anthropic's lead among $1M+ ACV accounts. Both can be true at the same time, and a routing policy should be informed by both.

The vendor underneath most production stacks just converted from a developer-friendly flat rate to metered API economics. It is also leasing a competitor's datacenter to serve existing customers, with no SLA. Multi-provider routing stopped being optional.

The June 15 Cliff

Starting June 15, Claude usage through third-party tools (Conductor, Zed, OpenCode, T3 Code) gets a separate credit bucket equal to plan value. No subsidized tokens, no rollover, and overflow bills at API rates. Any cost model that assumed flat-rate Claude consumption through these tools is dead in 30 days.

What the Capacity Fix Changes

Surface	Before (April)	After (announced May 7-14)
Claude Code limits	5-hour cap	Doubled
Peak-hours throttle	Reduced limits	Removed (Pro/Max)
Opus API rates	Squeezed	'Substantially raised'
Fleet composition	Anthropic-managed	Heterogeneous (incl. GB200)

Any Claude benchmark run between mid-April and May 7 is contaminated for baselining. Re-run after the new caps land, not before. Otherwise capacity noise gets attributed to prompt or model changes, and the wrong variable gets the credit.

Action items

Audit every Claude-backed workload (Agent SDK, GitHub Actions, batch evals) and reconcile projected token burn against the new credit cap by end of next week
Deploy an LLM gateway (LiteLLM, Portkey) with per-user, per-feature tagging and daily budget alerts within this sprint
Add a second frontier provider with automatic failover on 429/5xx behind a router abstraction
Re-baseline Claude Code and Opus API benchmarks (throughput, p95 latency, rate-limit headroom) post-Colossus integration before locking Q3 architecture decisions

Sources:Claude just metered your agent SDK calls · Claude Code latency on long-context requests drifted upward · Anthropic ships no per-user usage telemetry · Anthropic passes OpenAI in B2B · Vercel published a number worth sitting with · Agentic traffic crossed fifty-nine percent

59% Agentic: Your Eval Harness and Cost Model Are Measuring the Minority

The Number

Vercel's AI Gateway production index, drawn from 200,000 teams over 7 months, puts agentic workloads at 59% of all token volume. Anthropic takes 61% of spend through Opus on reasoning nodes. Google takes 38% of volume through Flash on throughput. Three different races. The leaderboard depends on which one you score.

The thing this doesn't tell you sits inside the eval stack. Most eval harnesses still score single-turn responses against reference answers. That was the right design in 2023. It now measures the minority of 2026 production traffic. The median request is a multi-step tool loop with retries, and what breaks in prod is a planner burning 40,000 tokens arguing with itself before giving up.

Where Cost Models Break

Cost models were fit when input-output ratios sat near 3:1. Agentic traces run closer to 15:1 on input, with heavy cache reuse on some providers and none on others. A forecast built on last year's ratio is off by roughly 5x on spend, and the error is asymmetric across vendors.

Glean's benchmark, vendor-published with methodology undisclosed, claims off-the-shelf MCP uses 30% more tokens and loses 2.5x head-to-head preference against an enterprise knowledge graph on agentic tasks. Read it as a hypothesis, not a result. The failure mode it points at is real: MCP tool listings balloon context windows.

If 59% of your tokens are agentic but 100% of your evals are single-turn, you're flying instruments-out — update the harness before you update the model.

The Routing Architecture That Emerged

The Vercel data shows a textbook tiered-routing signature already running across 200K teams:

Provider	Position	Implied Role
Anthropic (Opus)	61% of spend	Reasoning / planning nodes
Google (Flash)	38% of volume	High-throughput utility calls
OpenAI	Fast-growing share	Mixed; spiking post-model-update
Open source	Rising	Gaining traction, no loyalty

The explicit evidence of no vendor loyalty means a provider-agnostic routing layer isn't aspirational. It describes present-tense production reality. Application code still pinned to a single vendor's SDK is out of step with what the market is already doing.

Multi-Agent Decomposition Validates the Pattern

Microsoft's MDASH (100+ agents) beat Anthropic's Mythos on CyberGym by decomposing vulnerability work into scan → adversarial debate → PoC exploitation stages. No cost or latency comparison was published, which limits what you can conclude. The architectural signal still points the same direction: specialized routing across model tiers outperforms a single frontier model on complex tasks. The 59% agentic share and the MDASH result are two views of the same shift.

Action items

Add trajectory-level metrics to your eval harness this sprint: task success, tool-call precision/recall, steps-to-completion, cost-per-successful-task
Instrument per-node token cost in your agent pipelines and route utility calls (summarization, JSON extraction, query rewriting) to Flash/Haiku-class models
Run a 1-hour spike measuring token overhead of current MCP/tool-calling setup vs. a retrieval-first baseline on 100 production traces
Prototype a multi-agent decompose-debate-verify pipeline against your best single-agent baseline on a task with auto-verifiable outputs

Sources:Agentic traffic crossed fifty-nine percent · Vercel published a number worth sitting with · The CyberGym result · MCP plus knowledge graphs · ben's bites: Vercel AI Gateway

AISI Cleared, 271:1 Harness Delta, First AI Cybercrime in the Wild — The Offensive Capability Ceiling Just Moved

Three Data Points, One Direction

This week's evidence points at a specific capability boundary: end-to-end exploit chain completion in controlled evaluations, with corroborating signal from production work and the wild.

Mythos cleared both AISI attack ranges, the first model to complete full network takeover in controlled tests. The prior generation topped out at 'advanced persistence.' AISI is already building harder tests because the current ones are saturating.
Mozilla's custom harness surfaced 271 Firefox bugs (sandbox escapes, UAFs, race conditions) with the same model family that found exactly 1 low-severity CVE in curl when run as a generic scanner. Same weights, 271:1 yield ratio.
Google confirmed a threat actor using AI to build cybercrime tooling, the first production-grade in-the-wild incident behind post-Mythos misuse concerns.

The Harness Is the Product

The Mozilla vs. curl comparison is the most instructive data point for any team shipping LLM-powered tools:

Dimension	Mozilla + Firefox	Stenberg + curl
Model	Claude Mythos Preview	Claude Mythos Preview
Harness	Custom agentic, fuzzer-integrated, ephemeral VMs	Out-of-box scan
Bugs surfaced	271 (incl. sandbox escapes)	5 claimed → 1 real CVE
False-positive rate	~0% (sanitizer crash = truth)	~80%

The former Google Distinguished Engineer on Mozilla's team said it directly: model choice was not the dominant factor; the harness was. That transfers. A team debating Claude vs GPT vs Gemini is optimizing the wrong variable. A week of domain-specific harness engineering yields 50x+ more signal than a model swap.

When a frontier model yields 271 bugs for one team and 1 CVE for another against the same language, the harness is the product, not the model.

Implications for Teams Shipping Agents

The AISI result means refusal-rate harnesses are measuring the wrong bottleneck. Gating agent releases on jailbreak catch rates does not capture end-to-end exploit chain completion. A staged rubric covering recon, initial access, lateral movement, persistence, and exfil, run against every model upgrade, is a closer match to actual failure modes.

Google's in-the-wild confirmation means offensive AI tooling is now a detected event class, not a tabletop exercise. The cost structure favors the attacker: inference is cheap, orchestration is cheap, and the expensive part used to be the human operator, which is what the model replaced.

PraisonAI as Case Study

PraisonAI (open-source multi-agent framework) was weaponized within 4 hours of CVE disclosure. Agent frameworks have crossed the adoption threshold where threat actors watch their disclosure feeds. Any runtime holding API keys or tool-call permissions sits in the blast radius.

Action items

Add a staged cyber-capability tier to your agent release gate (recon → lateral movement → persistence → exfil rubric) before the next model upgrade
Spike a domain-specific agentic harness on one internal tool (code review bot, data quality checker) modeled on Mozilla's pattern — reproducible test cases + ephemeral VMs + existing signal pipelines
Instrument agent action sequences in production logs and train a lightweight classifier on known-bad tool-call trajectories
Inventory all agent frameworks in use and set CVE-feed subscriptions with same-day patching SLA

Sources:Mythos cleared the AISI attack ranges · The headline claim is that AI models have reached full network takeover · Mozilla shipped 271 bugs · PraisonAI exploited in four hours · Google's threat tracker describes an industrialized guardrail-bypass stack · Anthropic published the case study this week

Training Efficiency Frontier Moved: Three Drops That Change the Unit Economics

Three Results, One Direction

Three research drops landed in the same week. Each one moves training economics in a direction that matters for anyone running pre-training, post-training, or distillation this quarter.

Work	Claim	Scale Validated	Inference Impact	Replication Risk
Nous TST	2-3x wall-clock at matched FLOPs	270M → 10B-A1B MoE	None — no architecture change	Medium; single-source, clean claim
Datology VLM curation	+11.7 pts on 20 VLM benchmarks; 17x less compute	2B and 4B params	Lower response FLOPs — real serving win	Medium; benchmark-selection risk
NVIDIA Star Elastic	360x cheaper model-family production; 7x vs SOTA compression	Not specified	Family of sizes from one run	High; big number, lab-reported

Which to Spike First

TST is the highest-signal, lowest-risk bet. It is a pretraining recipe change with no inference-side consequence. If it replicates, it is a free 2-3x. No new hardware, no architecture migration, no serving changes. Run it on a 1B continued-pretraining task against a matched-FLOPs baseline. If wall-clock comes in at 1.6x with no val-loss regression, it pays for itself on the next full run.

Datology is the clearest evidence this year that the marginal dollar in VLM training has moved from compute to curation. At 2B parameters, pure data curation beat InternVL3.5-2B by about 10 points at 17x less training compute. The near-frontier 4B model hits 3.3x lower response FLOPs than Qwen3-VL-4B, which is a real serving-cost win, not just a leaderboard win.

Star Elastic's 360x is the kind of number that always shrinks under independent evaluation. The thing this number doesn't tell you is how much of it survives a different post-training distribution. Even a 30x hold would restructure how model-size tiers get produced for deployment. One post-training run producing a family eliminates the need to separately post-train each size variant.

The marginal dollar in VLM training just moved from compute to curation. TST gives a free 2-3x on pretraining with no inference change. Both are actionable this quarter.

Adjacent Signal: DuckDB + Kafka Share Groups

Two infrastructure releases landed on the same architectural assumption. The single-node analytics stack is growing multi-node features.

DuckDB Quack protocol: HTTP client-server mode makes DuckDB viable as a shared analytics service. Combined with the ECS Fargate plus Terraform pattern, it is a credible path to deleting Spark-on-Glue jobs that were single-node workloads wearing a distributed costume.
Kafka Share Groups: consumer parallelism decouples from partition count with roughly linear 8x scaling at 32 instances on I/O-bound workloads. The caveat is the part that matters in production. 8x on I/O-bound is not 8x on CPU-bound consumers, and most consumer fleets are mixed.

Both are worth spikes on the specific workloads they address. Neither requires an immediate architectural commitment.

Action items

Spike Token Superposition Training on a 1B continued-pretraining run against a matched-FLOPs baseline this quarter
Audit Glue/EMR job catalog for single-node candidates (<100GB working set) and pilot one on ECS Fargate + DuckDB + Terraform pattern
Benchmark Kafka Share Groups against your most partition-bound consumer group (embedding/enrichment workloads first)
For VLM work: allocate next training budget iteration 60/40 curation/compute rather than 20/80, using Datology's result as the prior

Sources:Claude just metered your agent SDK calls · DuckDB shipped a client-server mode · TLDR Data

◆ QUICK HITS

Update: LiteLLM added to CISA KEV (active exploitation confirmed) — rotate all provider API keys stored in its DB if running versions 1.81.16–1.83.7
SANS AtRisk
Apache Iceberg CVE-2026-42812 (CVSS 9.9): attacker with table-write can redirect metadata to poisoned S3 prefix — training data corruption vector for any lakehouse
SANS AtRisk
Gemini reproducibly emits real phone numbers from training data — 4 independent cases; add a PII extraction eval (canary insertion + divergence attacks) to LLM CI this sprint
The Download from MIT Technology Review
TML-Interaction-Small reports 0.40s turn-taking latency vs. 1.18s for GPT-Realtime-2.0 — a 3x gap via multi-stream 200ms micro-turn architecture; research preview, unverified
Simplifying AI
Duolingo publicly pegs AI-generated content 'slop' at ~20% requiring human QC — use as a calibration anchor for your own LLM acceptance-rate dashboard
TLDR Marketing
Only 15% of organizations have the data foundation for agentic AI (Fivetran); data quality/lineage cited as #1 blocker by ~50% — score agent projects against readiness before committing compute
TLDR Data
LLM-as-a-Verifier beats LLM-as-a-Judge on tie-rate and decision accuracy by decomposing criteria into repeated binary verifications at token granularity — cheapest variance reduction available this quarter
TLDR InfoSec
SAP (€100M partner fund) and ServiceNow (Action Fabric) both converged on Knowledge Graph + MCP as the enterprise agent architecture — treat KG grounding as the default for entity-heavy domains
TLDR IT
AI agents bypass legacy bot detection at 81% success rate — retrain abuse models with agent-generated traffic or your experiment populations are already contaminated
TLDR IT
Persona drift in LLM agents is measurable within 8 dialogue turns (Li et al., COLM 2024) — embed a verbal-tic canary and log per-turn retention as a zero-cost drift detector
Brian Ardinger, Inside Outside Innovation

◆ Bottom line

The take.

Anthropic killed the flat-rate Claude subsidy, leaked that they're running at 80x planned capacity (hence the April degradation), and is renting 220,000 GPUs from a competitor to keep the lights on — all while Vercel's production data shows 59% of tokens are now multi-turn agentic traces your eval harness doesn't measure and your cost model doesn't price correctly. Re-cost your Claude workloads before June 15, rebuild evals around trajectories not single turns, and add a second frontier provider behind a router before the next capacity miss becomes your outage.

Frequently asked

What exactly changes for Claude billing on June 15, and which workloads are affected?: Subscription-based Claude usage through third-party tools (Conductor, Zed, OpenCode, T3 Code) and programmatic surfaces like Agent SDK, claude-p, and GitHub Actions converts to a separate credit bucket equal to plan value, with overflow billed at metered API rates. There are no rollovers and no subsidized tokens, which removes the 70-90% effective discount Max-plan power users were extracting. Any cost model assuming flat-rate consumption needs to be re-run at API pricing before that date.
Why are single-turn eval harnesses inadequate when 59% of tokens are agentic?: Single-turn harnesses score one response against a reference answer, but the median production request is now a multi-step tool loop with retries, planning, and cache reuse. Failure modes that matter — a planner burning 40,000 tokens arguing with itself, tool-call precision collapse, or runaway step counts — are invisible to single-shot scoring. Trajectory-level metrics (task success, tool-call precision/recall, steps-to-completion, cost-per-successful-task) are required to measure what's actually shipping.
How should I rebalance training spend given the Datology and TST results?: For VLM work, shift the next budget iteration toward roughly 60/40 curation-to-compute, using Datology's 17x compute reduction at 2B-4B as the prior that data quality now dominates scale below 10B parameters. For text pretraining, spike Token Superposition Training on a 1B continued-pretraining run against a matched-FLOPs baseline; even a partial replication of the 2-3x wall-clock claim pays back on the next full run with no inference-side change.
Why does the Mozilla 271:1 result mean model choice is the wrong optimization?: The same Claude Mythos Preview weights surfaced 271 real Firefox bugs under Mozilla's custom agentic harness (fuzzer-integrated, ephemeral VMs, sanitizer-grounded truth) and exactly one low-severity curl CVE under an out-of-box scan. A week of domain-specific harness engineering — reproducible test cases, ephemeral execution, integration with existing signal pipelines — yields roughly 50x more signal than swapping frontier models. Teams A/B-testing Claude vs GPT vs Gemini before investing in harness design are optimizing the smaller variable.
What's the minimum instrumentation needed before the next Anthropic invoice lands?: Deploy an LLM gateway like LiteLLM or Portkey with per-user and per-feature tagging plus daily budget alerts, because Anthropic provides no native cost attribution, no per-user telemetry, and no budget alerts. Add a second frontier provider behind a router abstraction with automatic failover on 429/5xx, since the documented 8x capacity-plan miss quantifies single-provider risk. Then re-baseline Claude Code and Opus benchmarks after the post-Colossus capacity changes — pre-May numbers are contaminated and will misattribute capacity noise to prompt or model changes.

◆ Same day, different angle

Read this day as…

◆ Recent in data science

AnthropicEndsClaudeSubsidyasAgentTokensHit59%

◆ INTELLIGENCE MAP

◆ DEEP DIVES

The Convergence

Why Sources Disagree

The June 15 Cliff

What the Capacity Fix Changes

The Number

Where Cost Models Break

The Routing Architecture That Emerged

Multi-Agent Decomposition Validates the Pattern

Three Data Points, One Direction

The Harness Is the Product

Implications for Teams Shipping Agents

PraisonAI as Case Study

Three Results, One Direction

Which to Spike First

Adjacent Signal: DuckDB + Kafka Share Groups

◆ QUICK HITS

The take.

Frequently asked

◆ RELATED THREADS