Edition 2026-05-20 · read as Data Science
AnthropicEndsClaudeSubscriptionDiscount,OpenAICounters
- Sources
- 36
- Words
- 1,545
- Read
- 8min
Topics Agentic AI LLM Inference AI Regulation
◆ The signal
Anthropic killed the 70-90% effective discount on programmatic Claude usage overnight — subscriptions now convert to dollar-matched API credits across Agent SDK, GitHub Actions, and third-party harnesses. On the same day, OpenAI dropped a 2-month-free Codex enterprise switch promo. If you haven't reconciled projected token burn against the new credit cap, you're making a pricing decision by default. June 15 is the cliff for third-party tool credits (Zed, Conductor, OpenCode). Re-run unit economics this sprint.
◆ INTELLIGENCE MAP
01 Anthropic's Triple Squeeze: Metering, Capacity, and June 15
act nowAnthropic metered all programmatic usage at list price, disclosed an 80x capacity miss (planned for 10x), and leased xAI's entire 220K-GPU Colossus 1 cluster. June 15 kills third-party tool subsidies. OpenAI is counter-offering with a 2-month free Codex promo. Any Claude benchmark from before May 7 is stale.
- Colossus GPUs leased
- Ramp B2B share
- ARR growth (4mo)
- Subsidy cliff
- Anthropic B2B Share34.4
- OpenAI B2B Share32.3
02 59% Agentic Tokens: Eval and Cost Models Are Measuring the Minority
act nowVercel's 200K-team production data shows 59% of tokens are now agentic multi-turn traces. Anthropic captures 61% of spend via Opus; Google captures 38% of volume via Flash. Single-turn eval harnesses and input/output cost ratios from 2024 are off by ~5x on spend forecasting for agentic workloads.
- Anthropic spend share
- Google volume share
- MCP token overhead
- No vendor loyalty
03 Lakehouse & MLOps Stack: New CVSS 9.6-9.9 CVEs Across Iceberg, Polaris, Argo CD
monitorBeyond the previously-flagged LiteLLM/Ollama exploits, this week adds Apache Iceberg (CVSS 9.9, metadata write redirect), Polaris (9.9, credential broadening), Argo CD (9.6, plaintext K8s Secret extraction), and PraisonAI (auth bypass weaponized in 4 hours). The entire data/ML reference architecture now has a CVSS 9+ at every layer.
- Iceberg CVSS
- Polaris CVSS
- Argo CD CVSS
- n8n/Kestra CVSS
04 Training Efficiency: TST 2-3x, Datology 17x, Star Elastic 360x
monitorThree research drops landed in one week that change unit economics. Nous TST reports 2-3x wall-clock speedup at matched FLOPs with no inference architecture change (validated 270M→10B). Datology hit +11.7 pts on VLM benchmarks at 17x less compute via data curation alone. NVIDIA Star Elastic claims one post-training run produces a model family at 360x lower cost.
- TST speedup
- Star Elastic saving
- Datology VLM lift
- TST scale tested
05 AI Cyber Capability Crosses AISI Threshold — Harness Beats Model 271:1
backgroundMythos is the first model to clear both AISI simulated attack ranges. Mozilla's custom harness found 271 Firefox bugs with an older Claude model; the same model found 1 CVE in curl without the harness. The 271:1 delta proves eval/harness investment dominates model selection by 50x+ on real codebases. Palo Alto surfaced serious vulns across 130+ products.
- AISI ranges cleared
- Mozilla Firefox bugs
- curl bugs (same model)
- Palo Alto products
- With custom harness271
- Generic scan1
◆ DEEP DIVES
01 Anthropic's Triple Squeeze: Your Claude Budget Just Broke
What Happened
The pricing change is the surface event, and it compounds with two others. Anthropic metered all programmatic Claude usage: subscriptions now convert to dollar-matched API credits across Agent SDK, claude-p, GitHub Actions, and third-party harnesses. The implicit 70-90% discount power users were getting is gone. At Code with Claude, Dario Amodei disclosed that Anthropic planned for 10x growth and got 80x, which is the most plausible read on the quality degradation users have been logging since mid-April. The emergency fix is leasing xAI's entire Colossus 1 cluster (220,000+ GPUs), from the CEO who called Anthropic 'misanthropic' three months ago.
The June 15 Cliff
Starting June 15, Claude usage through third-party tools (Conductor, Zed, OpenCode, T3 Code) draws from a separate credit bucket equal to plan value. No subsidized tokens, no rollover, overflow billed at API rates. Claude Code and the Claude app are unaffected. Weekly rate limits get bumped fifty percent for two months as a softener.
Surface Before After (announced) Claude Code limits 5-hour cap Doubled Peak-hours throttle Reduced for Pro/Max Removed Opus API rate limits Squeezed 'Substantially raised' Third-party tools Subsidized Credit-capped, overflow at list The Counter-Offensive
OpenAI dropped a 2-month-free Codex enterprise switch promo the same day. Ramp's April data shows Anthropic edging OpenAI 34.4% vs 32.3% in B2B spend, the first apparent lead change. The thing the Ramp number doesn't tell you is whether the lead survives the metering change. OpenAI is pricing a response against the exact developers Anthropic just alienated, and a free evaluation window is an asymmetric-payoff bet for the buyer.
Why This Matters For Your Stack
ServiceNow already burned its full-year Claude budget by May after the price hikes. Anthropic provides no native per-user or per-tool usage telemetry and no SLAs, which is unusual for a dependency on the critical path. The vendor dashboard cannot attribute spend to a tenant or a prompt. That instrumentation has to come from the consumer side.
Any Claude benchmark from before May 7 is stale. Any cost model built before the metering change is numerically wrong, not directionally wrong.
The Capacity Aftermath
The 80x growth miss means observed 'quality degradation' was quantization, smaller-model routing under load, and scheduler unfairness, not product changes. The Colossus integration (H100, H200, GB200 heterogeneous fleet) will produce p95/p99 variance during transition. Tail latency gets weirder before it stabilizes, and that is the metric to watch, not the mean.
Action items
- Audit every Claude-backed workload (Agent SDK, GitHub Actions, batch evals) against the new credit cap and flag jobs that exhaust credits before month-end
- Deploy an LLM gateway (LiteLLM/Portkey) with per-user, per-feature tagging and daily token budget alerts in front of all Claude traffic
- Run a 2-month Codex evaluation under OpenAI's free promo with matched prompts and tool schemas against your existing Claude harness
- Re-baseline Claude throughput and latency benchmarks after Colossus integration stabilizes (target: late May)
- Reforecast Claude spend for teams using Zed, Conductor, or OpenCode — model post-June-15 scenario where overflow hits API rates
Sources:Claude just metered your agent SDK calls · Claude Code latency on long-context requests · Anthropic ships no per-user usage telemetry · Anthropic passes OpenAI in B2B · Vercel published a number worth sitting with · Anthropic passed OpenAI in enterprise share
02 59% Agentic Tokens: Your Eval Harness and Cost Model Are Measuring the Minority
The Production Data
Vercel's AI Gateway, covering 200,000 teams over 7 months, reports 59% of production tokens as agentic now: multi-turn, tool-calling traces rather than single-shot completions. Six months ago the figure was under 20%. The composition has shifted faster than at any point since chat endpoints replaced completion endpoints.
The Spend-Volume Divergence
The provider landscape has split along two axes that do not line up:
Provider Share of Spend Share of Volume Primary Model Implied Role Anthropic 61% — Opus Reasoning/planning nodes Google — 38% Flash High-throughput utility calls Others ~39% ~62% Mixed Mixed Teams are already routing across providers: expensive models for reasoning, cheap models for throughput. The Gateway data shows no vendor loyalty. A serving layer hardcoded to one provider SDK is out of step with what 200K production teams are actually running.
Why Existing Harnesses Fail
Most agent eval harnesses score single-turn responses against reference answers. That was the right call in 2023. Agentic traces now run at a 15:1 input-to-output ratio, with heavy cache reuse on some providers and none on others. A forecast built on last year's 3:1 ratio is off by roughly 5x on spend. The metrics that matter now:
- Cost-per-successful-task, not cost-per-token
- Tool-call precision and recall, not single-turn accuracy
- Steps-to-completion and retry rates
- Recovery-from-error rate across the trajectory
Glean's benchmark claims off-the-shelf MCP burns 30% more tokens than retrieval-tuned knowledge graphs on agentic tasks. Vendor-published with no methodology, so treat it as a hypothesis. The thing this doesn't tell you is whether the gap holds outside Glean's own task distribution. The failure mode is real, though: MCP tool listings balloon context windows, and naive tool outputs return verbose blobs where a reranked snippet would do.
The Routing Architecture That Wins
Abridge (80M+ clinical conversations, 250 health systems) discloses the pattern: cheap fast model for triage, large model for reasoning only when needed. Post-training on proprietary data beats frontier on narrow tasks at lower cost. The plausible cost reduction envelope from tiered routing is 20-40% at constant trajectory completion rate. If it lands at half that, the routing layer still pays for itself.
If 59% of your tokens are agentic but 100% of your evals are single-turn, you're flying instruments-out.
Action items
- Add trajectory-level metrics (task success, tool-call F1, steps-to-completion, cost-per-successful-task) to your eval harness this sprint
- Instrument per-node token cost in your agent graph and route utility calls (summarization, JSON extraction, query rewriting) to Flash/Haiku-class models
- Run a 1-hour spike measuring token overhead of your current MCP/tool-calling setup vs. a retrieval-first baseline on 100 production traces
- Add LLM-judge-to-human-annotator agreement as a tracked quarterly metric in your eval pipeline
Sources:Agentic traffic crossed fifty-nine percent · Vercel published a number worth sitting with · The CyberGym result · Abridge runs model routing across 100M conversations · MCP plus knowledge graphs
03 Three Training Efficiency Results That Change Your Q3 Compute Budget
The Drops
A cluster of pretraining results landed in one week, each nudging training unit economics in a direction that matters for any team running pretraining, fine-tuning, or distillation.
Work Claim Scale Validated Inference Impact Replication Risk Nous Research TST 2-3x wall-clock at matched FLOPs 270M → 10B-A1B MoE None — no architecture change Medium; single-source NVIDIA Star Elastic 360x cheaper model-family derivation Not specified Produces family of sizes High; lab-reported Datology VLM curation +11.7 pts, 17x less compute 2B and 4B Lower response FLOPs Medium; benchmark-selection Why TST Is the One to Spike First
Token Superposition Training is a pretraining recipe change with no inference-side downstream. If it replicates, it's a free 2-3x speedup. The mechanism, superposing multiple token targets per forward pass, has been validated from 270M to 10B-A1B MoE scale. No architecture change means existing serving infrastructure stays untouched. This is the cleanest 'try it and know within a week' experiment on the list.
Datology: The Marginal Dollar Moved to Curation
Datology's result is the clearest evidence this year that the marginal dollar in VLM training has moved from compute to curation. Their 2B model beat InternVL3.5-2B by ~10 points across 20 benchmarks while using 17x less training compute, and their 4B matched near-frontier quality at 3.3x lower response FLOPs than Qwen3-VL-4B. The serving cost reduction is real. The thing this doesn't tell you is whether the curation advantage is durable or copied within two quarters.
Star Elastic's 360x Needs Discount
NVIDIA's claim that one post-training run produces a family of reasoning model sizes at 360x lower cost than pretraining a family is the kind of headline number that always shrinks under independent evaluation. Even a 30x hold would restructure how size tiers get produced. But 360x→30x is a common compression pattern in lab-reported claims. Wait for replication before planning around it.
Combined Implication
Together these results suggest that Q3 2026 training runs priced at Q1 2026 efficiency assumptions are systematically overspending. A team about to allocate $X to a continued-pretraining run should first check whether TST halves the wall-clock, whether curation halves the data requirement, and whether Star Elastic removes the need to pretrain multiple sizes. The order matters: TST replicates fastest, Datology requires a curation pipeline you may not have, and Star Elastic is still on the wait-for-replication shelf.
The marginal dollar in VLM training moved from compute to curation. The marginal dollar in pretraining moved from more FLOPs to better recipes. Budgets that don't reflect this are overpaying.
Action items
- Spike Token Superposition Training on a 1B continued-pretraining run against a matched-FLOPs baseline this month
- Audit your VLM/multimodal training data pipeline for curation quality — measure per-shard quality scores and deduplicate before the next training run
- Hold off on pretraining separate model size tiers until Star Elastic publishes reproducible details
- Pull SWE-ZERO-12M-trajectories (112B tokens, 12M trajectories, 3K repos) and begin preprocessing before licensing frictions appear
Sources:Claude just metered your agent SDK calls · DuckDB shipped a client-server mode this week
04 Apache Iceberg/Polaris CVSS 9.9 and Argo CD 9.6: The Lakehouse Trust Boundary Shrank
What's New This Week
On top of the LiteLLM (KEV-listed) and Ollama exploits flagged last week, four more critical CVEs landed on the data and ML stack in the same window. The blast radius is the infrastructure most DS teams treat as trusted by default, which is the part that matters.
The New CVEs
Component CVE / CVSS Attack Class Blast Radius Apache Iceberg CVE-2026-42812 / 9.9 Metadata write redirect Poisoned tables, corrupted training data Apache Polaris CVE-2026-42809/10/11 / 9.9 Credential broadening S3/GCS creds, cross-tenant access Argo CD 3.2.x/3.3.x CVE-2026-42880 / 9.6 Missing authorization Plaintext K8s Secret extraction PraisonAI CVE-2026-44338 Auth bypass Agent runtime, connected DBs NGINX rewrite 18-year latent RCE Unauthenticated RCE Model-serving ingress Why Iceberg CVSS 9.9 Is the Worst
An attacker with table-write permission can point metadata at an attacker-controlled S3 prefix. The next query reads poisoned Parquet. The next training run ingests silently corrupted features. The thing CVSS doesn't tell you is that most lakehouse observability does not detect metadata pointer mutations. Default logging covers row changes, not pointer changes. Combined with Polaris credential-broadening, there is a plausible path from compromised analyst notebook to cross-tenant data theft.
Argo CD: Every Secret Is Disclosed
For teams running training jobs or model services on Argo CD 3.2 or 3.3, treat every K8s Secret in reachable namespaces as disclosed until patched and rotated. That includes model-registry tokens, HuggingFace PATs, DB passwords, and cloud credentials. The rotation costs more than the patch. It is also the part that actually closes the incident.
PraisonAI: Agent Frameworks Are First-Class Targets
PraisonAI's auth bypass was weaponized within 4 hours of CVE disclosure. Four hours is the operational signal: agent frameworks have crossed the adoption threshold where threat actors monitor their disclosure feeds. Any team prototyping with LangChain, CrewAI, AutoGen, or PraisonAI whose runtime holds API keys sits in the same blast radius.
Draw a reference architecture for a modern data team, throw darts at it, and every throw hits a CVSS 9.0 or higher this week.
Action items
- Patch Argo CD to ≥3.2.12/≥3.3.10 and rotate every Kubernetes Secret in namespaces it can read — including model-registry tokens, HF PATs, and cloud credentials
- Audit Iceberg/Polaris catalog configurations: enforce explicit storage credential scoping and add write-path allowlisting for table metadata locations
- Inventory all agent frameworks in use and pin versions; patch PraisonAI immediately if present; add CVE-feed subscriptions for LangChain, CrewAI, AutoGen
- Patch NGINX across all inference gateways and audit rewrite-module usage in model routing configs
Sources:LiteLLM landed in the KEV catalog this week · An Ollama endpoint exposed to the public internet · PraisonAI, an open-source multi-agent framework · SANS AtRisk newsletter
◆ QUICK HITS
DuckDB shipped Quack HTTP client-server mode — viable Spark/Glue replacement for sub-100GB single-node ETL; spike one job on ECS Fargate + DuckDB + Terraform pattern
DuckDB shipped a client-server mode this week
Kafka Share Groups decouple consumer parallelism from partition count with ~linear 8x scaling at 32 instances on I/O-bound workloads — benchmark your most partition-bound consumer group
DuckDB shipped a client-server mode this week
Only 15% of organizations have the data foundation for agentic AI at scale (Fivetran); data quality/lineage is #1 blocker cited by ~50% — use as gate before greenlighting agent projects
DuckDB shipped a client-server mode this week
Microsoft agent memory architecture (consolidation + forgetting + delayed maturation) stabilizes at 400-500 memories with 97.2% retention precision — concrete alternative to flat vector top-k
DuckDB shipped a client-server mode this week
DeepSeek V4 Pro scored 77/100 on FlowGraph at $2.25/task, sitting between Opus 4.7 and Kimi K2.6 — re-benchmark for cost-per-successful-task on your production distribution
The CyberGym result is the kind of finding
PyTorch 2.12 shipped MX quantization export — the highest-leverage framework change this week for teams under inference cost pressure deploying low-precision models
The CyberGym result is the kind of finding
TML-Interaction-Small reports 0.40s turn-taking latency vs 0.57s Gemini-3.1-flash-live and 1.18s GPT-Realtime-2.0 — full-duplex voice is becoming its own model class
TML is reporting 0.40 seconds of full-duplex latency
Opus 4.7 tripled image processing costs — re-price multimodal inference budgets and evaluate GPT-4V/Gemini head-to-head on your actual vision workload
Anthropic passes OpenAI in B2B
Compute demand-to-supply ratio is 4:1 at Nebius (684% YoY revenue, $3-3.4B 2026 guidance vs $530M 2025) — lock H2 GPU reservations across 2+ providers before quarterly sellouts
The 4:1 ratio is the headline number
Duolingo publicly pegs AI-generated content rejection rate at ~20% requiring human QC — anchor your own generation pipeline acceptance-rate targets against this production baseline
Duolingo's twenty percent AI slop rate
Update: LLM-as-a-Verifier reportedly beats LLM-as-a-Judge on tie-rate and decision accuracy by decomposing criteria into repeated binary verifications at token granularity
An Ollama endpoint exposed to the public internet gets picked up by Shodan
PCAOB/COSO guidance now requires deterministic execution and tamper-evident audit trails for ML in regulated finance — audit GPU non-determinism, seed management, and model-artifact immutability
The transformer underwriting models are outperforming
◆ Bottom line
The take.
Anthropic killed the implicit subsidy on programmatic Claude usage the same week Vercel confirmed 59% of production tokens are agentic — meaning your cost model and your eval harness are both measuring last year's workload. Meanwhile, Apache Iceberg, Polaris, and Argo CD shipped CVSS 9.9 vulnerabilities that can poison training data or disclose every secret in your cluster. Re-run unit economics before June 15, rebuild the eval harness around trajectories not turns, and patch the lakehouse before the next training job ingests corrupted features.
Frequently asked
- What changed with Claude subscriptions converting to API credits?
- Anthropic ended the implicit 70-90% discount on programmatic Claude usage. Subscriptions now convert to dollar-matched API credits across the Agent SDK, claude-p, GitHub Actions, and third-party harnesses, so workloads that previously ran under flat-rate plans now meter against a hard credit cap with overflow billed at list API rates.
- Why does June 15 matter for teams using Zed, Conductor, or OpenCode?
- Starting June 15, Claude usage through third-party tools draws from a separate credit bucket equal to plan value, with no subsidized tokens and no rollover. Overflow bills at API rates. Claude Code and the Claude app are exempt. Teams need a reforecast before sprint planning locks, since spend on those surfaces can spike materially overnight.
- How should eval harnesses change given that 59% of production tokens are now agentic?
- Single-turn accuracy scoring no longer represents the majority workload. Add trajectory-level metrics: cost-per-successful-task, tool-call precision and recall, steps-to-completion, retry rate, and recovery-from-error rate. Vercel's data across 200K teams shows agentic share tripled in six months, and harnesses scoring single-turn responses against reference answers are measuring the minority case.
- Which training efficiency result is worth spiking first?
- Token Superposition Training from Nous Research. It's a pretraining recipe change validated from 270M to 10B-A1B MoE scale, claiming 2-3x wall-clock at matched FLOPs with no architecture change and no inference-side impact. Existing serving infrastructure stays untouched, making it a clean week-long experiment. Datology's curation result and NVIDIA's Star Elastic 360x claim need more replication before betting budget on them.
- What's the most urgent action from this week's CVE cluster on the data stack?
- Patch Argo CD to ≥3.2.12/≥3.3.10 and rotate every Kubernetes Secret in namespaces it can read, including model-registry tokens, HuggingFace PATs, DB passwords, and cloud credentials. The missing-authorization flaw (CVSS 9.6) means every secret was readable by any authenticated user, so rotation, not just patching, is required to close the incident.
◆ Same day, different angle
Read this day as…
◆ Recent in data science
Keep reading.
- Princeton's ICML 2026 audit added GPT 5.5, Gemini 3.5 Flash, and Claude Opus 4.7 and found zero meaningful reliability improvement over pred…
- Hugging Face Transformers has an RCE path that fires from model config files — not pickle weights — across 2.2 billion installs.
- Anthropic ended the flat-rate Claude subsidy this week.
- Anthropic killed the flat-rate Claude subscription this week.
- Anthropic quietly killed the 70-90% effective discount on programmatic Claude usage — subscriptions now convert to dollar-matched API credit…