Edition 2026-05-05 · read as Leader
Anthropic's60%AutonomousAIR&DForecastCompressesTimelines
- Sources
- 37
- Words
- 1,880
- Read
- 9min
Topics Agentic AI LLM Inference AI Capital
◆ The signal
Anthropic's Jack Clark now puts autonomous AI R&D at 60%+ probability by end of 2028, and the evidence is harder to wave off than last quarter's version: training optimization moved from 2.9× to 52× in under twelve months, autonomous task horizons improved 1,440× in four years, and SWE-Bench reached 93.9%. In the same week, Uber disclosed Claude Code running $500–$2,000 per engineer per month, enough to burn its entire annual AI budget in four months. The three-year plan is a two-year plan, and the cost assumptions underneath it are off by roughly a factor of three.
◆ INTELLIGENCE MAP
01 PE Captures AI Distribution — $11.5B in Deployment JVs
act nowAnthropic's $1.5B JV with Blackstone/Goldman/H&F and OpenAI's $10B JV with 19 PE firms have created a new distribution layer that routes around enterprise procurement. PE sponsors with portfolio companies numbering in the thousands are becoming the default AI channel for mid-market. The vendor evaluation is over before it starts.
- Anthropic JV size
- OpenAI JV size
- PE firms in OpenAI JV
- Anthropic JV partners
02 AI Tool Cost Crisis — Budgets Off by 3-4×
act nowUber burned its entire annual AI budget in 4 months at $500–$2K/engineer/month for Claude Code. A single Copilot agentic session consumed $221 against a $40 subscription. Of major AI coding tools, only Replit claims profitability ($1B run rate, 300% NRR). DeepSeek V4 Pro offers 17× cheaper alternative. The subsidy era ends this year.
- Claude Code/eng/month
- Copilot session cost
- Copilot subscription
- DeepSeek cost arbitrage
- Actual agentic cost221
- Subscription price40
03 Autonomous AI R&D — 60% by 2028 Reframes Planning
monitorAnthropic co-founder Jack Clark's thesis is backed by five converging metrics: 52× training speedup in 12 months, 1,440× task horizon improvement in 4 years, SWE-Bench at 93.9%, CORE-Bench solved. $500M+ raised by Recursive Superintelligence alone. OpenAI targeting 'automated AI research intern by September 2026.' The three-year strategy becomes a two-year strategy.
- Training speedup
- Task horizon gain
- SWE-Bench score
- RSI startup funding
- Training optimization (2024)2.9
- Training optimization (2025)52
04 Five Eyes Agent Governance — Compliance Clock Started
monitorNSA-led Five Eyes guidance maps AI agents onto zero trust and least privilege. Pattern from prior advisories: 12–24 months to binding procurement requirements. Machine identity management for agents identified as emerging capability gap. FedRAMP 20x landing zones lower federal market entry costs, reshaping government SaaS competition.
- Nations coordinating
- Time to binding
- Agent identity gap
- FedRAMP cost drop
- Joint guidance (now)Five Eyes advisory published
- Procurement language6-12 months
- Binding requirements12-24 months
- Audit enforcement24-36 months
05 Security Convergence — Three Pillars Failing Simultaneously
backgroundODNI is pulling back from systemic state-actor tracking. 66% of cybersecurity staff are flight risks. npm supply chain attacks now weaponize auto-update (572K+ weekly downloads compromised). GPT-5.5 solved every CTF in a test set. AI-assisted offense has lapped defensive stacks. The combination creates a compounding threat no single budget increase addresses.
- Staff at flight risk
- npm downloads hit
- GPT-5.5 CTF solve rate
- Exploit-to-patch window
◆ DEEP DIVES
01 PE Becomes the AI Distribution Kingmaker — And Your GTM Plan Wasn't Built for It
The Channel That Didn't Exist 90 Days Ago Now Owns Mid-Market Access
Anthropic's $1.5 billion joint venture with Blackstone, Hellman & Friedman, Goldman Sachs, and General Atlantic, paired with OpenAI's $10 billion deployment JV alongside a 19-firm PE consortium, is the fastest GTM buildout enterprise software has produced. The sponsors involved collectively own tens of thousands of portfolio companies. When a general partner tells a portfolio CEO to deploy Claude for back-office automation, the vendor evaluation is already over by the time it starts.
The asset being acquired is not the EBITDA. It is the permission to land software in accounts that the vendors cannot reach directly.
Why This Is Different From Prior Channel Plays
A reasonable skeptic would point out that PE-mediated software selection has historically been looser than the org chart suggests. The skeptic is correct about the past. A sponsor with committed capital in a deployment JV does not behave like a sponsor with a preferred-vendor list. The incentive is different. The reporting cadence is different. The default is different.
Anthropic's structure is the more instructive half of the story. Funding dedicated integration consultants through the JV is functionally buying a sales force with customer access included. This is the template Accenture built for cloud transformation, with the model provider owning the consulting relationship this time. If it works, every other lab copies the template inside eighteen months.
The Bifurcation
Enterprise AI distribution has split into two motions that do not compose:
- OpenAI's workspace-embed model: Codex integrates into files, docs, spreadsheets, and slides, building switching costs through code that ships daily
- Anthropic's institutional-mandate model: PE sponsors drive top-down adoption across thousands of companies at the same time
Both can work. Only one can win the same customer in the same quarter. Switching costs on workspace-integrated tooling compound faster because the integration touches production code. The decision about which vendor owns the workspace gets harder to reverse every quarter it is left alone.
What This Means For Your Pipeline
The competitive set for any enterprise AI vendor now includes a capital stack it was not modeled against. If a competitor ships pre-negotiated into twenty portfolio companies before the sales cycle opens, the win rate in those accounts is not a function of the product. It is a function of who arrived with the check.
Action items
- Map your customer base against PE consortium ownership within 30 days — identify which accounts are now in the OpenAI/Anthropic distribution lock-in zone
- Evaluate Anthropic's PE JV as a channel conflict or partnership opportunity for your enterprise sales by end of Q2
- Build or identify a capital-backed distribution partner by Q3 — the independent channel partner is being replaced by portfolio-company mandates
Sources:AI Weekly · The Information AM · 🔳 Turing Post · TLDR AI · AI Breakfast · Simplifying AI
02 The AI Budget Crisis Is Here: Agentic Costs 3-4× What You Planned
Uber's Revelation Is Your Preview
Uber disclosed that Claude Code costs $500 to $2,000 per engineer per month and that the company burned its annual AI budget in four months. In a separate incident, one Copilot agentic session consumed $221 of inference against a forty-dollar subscription. The temptation is to treat these as outliers. They are not. They are the early shape of agentic cost curves, and they invalidate the budget assumptions most finance teams wrote in 2024 and early 2025.
The AI line item is not behaving like software seats and is not behaving like cloud compute. It scales with engineer output — which is the variable the CFO was told would go up.
The Subsidy War Cannot Last
A reasonable skeptic would say the pricing will settle once the providers compete on efficiency. The reasonable skeptic has a point, and also a problem. Of the three names that matter in AI coding tools, only Replit claims profitability, at roughly a billion-dollar run rate with 300% net revenue retention. OpenAI's Codex is heavily subsidized. Cursor is margin-negative. Anthropic doubled Claude Code enterprise token costs in the same window that DeepClaude demonstrated a 17× cheaper alternative on DeepSeek V4 Pro. The subsidies end when the capital patience ends, and the capital patience is already thinning.
Vendor Pricing Unit Economics Moat OpenAI Codex $40/seat (subsidized) Deeply negative Workspace integration Cursor ~$20/seat Margin negative Developer workflow Replit Consumption Profitable Non-technical users DeepSeek V4 Pro Open weights 17× cheaper Cost leadership The Dual-Stack Is Now Mandatory
The defensible architecture for the next two years is proprietary models for differentiated work and open weights for everything else. IBM's Granite 4.1, at 30B parameters, 512K context, Apache 2.0, zero licensing cost, and DeepSeek V4's 1.6 trillion parameters with cheap inference, provide the escape valve. The uncomfortable part is not the architecture. The architecture is well understood. The dual-stack most organizations need is an operational capability they have not built.
The Repricing Is Coming
Microsoft's formal move from per-seat to consumption-based pricing is the signal worth reading. Flat-rate AI subscriptions do not survive contact with agentic workloads. The top ten percent of users consume 50× the median in agentic patterns, and any firm still offering a flat-rate AI capability is absorbing that distribution at its own expense. The board-deck version of the decision is to raise prices. The more useful version is to lead the industry into consumption pricing this quarter, while competitors are still subsidizing their heavy users next quarter.
Action items
- Conduct emergency AI budget stress-test this week using Uber's revealed unit economics ($500-$2K/eng/month) as the new baseline assumption
- Stand up an open-weight deployment capability within 60 days — evaluate DeepSeek V4 and IBM Granite 4.1 for non-differentiated workloads
- Renegotiate any flat-rate AI contracts expiring in the next 12 months toward consumption models with caps — before vendors reprice unilaterally
Sources:AI Weekly · TLDR AI · AINews · Simplifying AI · Unwind AI · TLDR Founders
03 The 2028 Compression: Autonomous AI R&D Changes What a Strategy Means
The Evidence Is Stronger Than the Headline
Jack Clark's thesis, a 60%+ probability of fully autonomous AI R&D by the end of 2028, rests on five independent capability measurements converging at once:
- SWE-Bench coding: 2% to 93.9% in 2.5 years (saturated)
- METR autonomous task horizons: 30 seconds to 12 hours in 4 years (1,440× improvement)
- Training optimization: 2.9× to 52× speedup in under 12 months (an 18× jump)
- CORE-Bench: declared solved 15 months after launch
- PostTrainBench: roughly 50% of human performance, the current ceiling
A reasonable skeptic would point out that benchmarks saturate faster than real-world capability generalizes. The skeptic is correct. What the skeptic does not explain is why three independent measures (training efficiency, task horizon, and benchmark ceiling) are all moving in the same direction at the same time. One curve is a story. Three moving together is a trend line a board has to plan against.
The two-year window has to produce the optionality the third year was supposed to provide — which is a different planning exercise than simply moving the deadline forward.
The Gap That Matters
The critical strategic variable is the gap between AI engineering, largely achieved, and AI research, still uncertain. Today's systems reproduce papers and ship working code, and they can tune a training run when the target is specified. What they still struggle with is the creative leap that produces a paradigm shift. PostTrainBench at roughly 50% of human is the canary. If it saturates on the 12-18 month curve SWE-Bench and CORE-Bench followed, Clark's window stops looking aggressive.
The Alignment Constraint
Clark's arithmetic on recursive deployment is the uncomfortable part: a 99.9%-accurate alignment technique degrades to 60.5% after 500 recursive generations. Any self-improving deployment without a stability story is building on a foundation that decays by design. This is an engineering constraint today and a regulatory requirement tomorrow. The governance work has to start before the regulator makes it a filing.
What This Means Operationally
All three frontier labs have stated their timelines in public. OpenAI is targeting an "automated AI research intern by September 2026." A startup literally named Recursive Superintelligence has raised more than $500M. The emerging 'machine economy', capital-heavy and human-light firms where agents transact with agents, is the logical endpoint of these curves rather than a separate thesis bolted on top.
The hiring plan set this quarter determines 2028 capability, which is the board-deck version. The complete version is that the firms building internal recursive improvement loops now, with the evaluation harness and the humans who know how to supervise a system that proposes its own experiments, compound a lead the packaged vendor version cannot close. That vendor version arrives 18 months after the early movers have already compounded.
Action items
- Conduct a 'recursive improvement stress test' of your 3-year strategic plan by end of Q2 — model scenarios where AI capabilities are 10× and 100× current levels
- Stand up a team tasked with using AI to automate your own R&D workflows — start with kernel optimization, code generation, and automated testing pipelines
- Shift hiring emphasis from raw coding ability to AI orchestration, system design judgment, and research taste — update leveling rubrics this quarter
- Build an AI safety and alignment governance framework before regulators mandate one — specifically model compounding alignment error for any recursive AI workflows
Sources:Jack Clark from Import AI · 🔳 Turing Post · AINews
04 Five Eyes Governance + Security Convergence: The Next 12 Months of Agent Compliance Are Written
The Regulatory Clock Is Running
Five allied nations, led by NSA, published joint guidance on autonomous AI agents last week. The pattern from here is not speculative. Every prior zero-trust directive from this set of agencies became procurement language inside 18 months, which is the sequence worth naming explicitly: joint guidance, then national standards, then procurement requirements, then audit expectations. The track record is not ambiguous, and the clock started when the guidance dropped.
The more interesting choice is what the guidance did not do. It did not invent a new framework. It mapped agents onto zero trust, least privilege, and defense-in-depth. That is a deliberate decision to ship imperfect controls fast rather than purpose-built controls slowly. It also moves the maturity bar. Zero-trust programs that stopped at network segmentation, without extending to workload identity for autonomous actors, are now operating against a documented gap.
Machine Identity: The Market That Doesn't Exist Yet
Current identity platforms were designed for 10,000 employees and a manageable multiple of machine identities. In an agentic world that ratio inverts to 10,000 employees and 10 million ephemeral agent identities per day. The infrastructure to issue, validate, rotate, and revoke credentials at that velocity does not exist in most environments. The board-level question is build, acquire a startup solving it, or partner with a platform vendor extending into it.
Three Pillars Failing Simultaneously
The convergence is the story, not any single element of it:
- Federal intelligence is stepping back. ODNI is pulling back from the systemic state-actor tracking enterprises relied on for free for 20 years. That free service is now a line item with a vendor contract attached.
- 66% of security staff are flight risks. The drivers are structural — authority, career path, flexibility — not compensatory. Spot bonuses do not manufacture authority a CISO does not wield.
- AI-augmented offense has lapped defense. GPT-5.5 solved every CTF in a test set. npm supply chain attacks weaponized auto-update at 572K+ weekly downloads. North Korean operators used AI for social engineering netting $577M in crypto.
The organizations that solve retention, fund their own threat intelligence, and put agentic AI governance in before the agents ship will spend the next four quarters building a moat their competitors are actively digging around themselves.
FedRAMP 20x: The Moat Drops
FedRAMP landing zones lower the fixed cost of federal authorization that functioned as an incumbent moat for a decade. Smaller vendors that could never justify the authorization spend now can. A reasonable skeptic would say federal GTM remains slow regardless of cost structure. The skeptic is right about the average case and wrong about the edge: whoever inherits trust from GSA and extends it to dozens of SaaS applications becomes the control plane for federal cloud adoption. Mid-market vendors should be accelerating federal GTM now. Platform companies should be racing to become the authorized landing zone itself.
Action items
- Audit all deployed or planned AI agent systems against Five Eyes guidance within 60 days — focus on excessive privileges, cascading failure risk, and weak auditability
- Evaluate machine identity management capabilities — determine whether your IAM stack can issue cryptographic identities for autonomous AI agents at scale
- Commission a 90-day threat intelligence self-sufficiency assessment given ODNI's withdrawal from systemic state-actor tracking
- Restructure security org to address the three retention drivers: career pathways, genuine authority, and flexibility — not compensation alone
Sources:CyberScoop · CSO Security Leadership · TLDR InfoSec · Risky.Biz · TLDR IT
◆ QUICK HITS
Update: Google ceded safety guardrail veto to Pentagon — DoD can adjust AI safety settings for 'any lawful government purpose' with no vendor override, 600+ employees protested, Pichai approved in 24 hours
Mindstream
HBM memory tightness hit 89.0 (top of scarcity band), rising 3 points/week — SK Hynix, Micron, Samsung all describe 2026 capacity as committed; agent roadmaps are now gated by memory procurement, not model quality
Teng Yan | Chain of Thought
Team size to build competitive AI coding tools compressed 1,000× in 3 years — Copilot (thousands) → Cursor (~1/100th) → Claude Code (started with 2) → OpenClaw (1 person)
🌀 Refactoring
OpenAI has named 'harness engineering' as its internal discipline — engineers build scaffolding (prompts, tools, eval loops, guardrails), agents write the code; AGENTS.md files are the new README
TLDR Dev
Stripe and Cloudflare shipped production agent commerce infrastructure — autonomous agents can now create accounts, buy domains, start subscriptions with a $100/month spending cap and no human in the loop
Unwind AI
Oracle cutting 30,000 jobs to fund $300B AI infrastructure deal with OpenAI — employees reportedly made to document workflows weeks before termination; 600+ workers signed collective demand letter, age-discrimination litigation expected
Simplifying AI
Sierra hit ~$200M ARR at $15B valuation (75× revenue multiple) growing from $100M to $150M+ in three months — sets the clearing price for AI-native enterprise companies with demonstrated velocity
AINews
Brent crude +89% YTD from Strait of Hormuz blockage — Spirit Airlines dead after 34 years, Apple warns Mac shortages will persist months, budget-consumer digital spending erosion expected within two quarters
Morning Brew
Meta deployed AI 'Second Brain' to 60,000 knowledge workers with RAG and agentic capabilities — no longer a pilot; resets the benchmark for what a 'serious' internal AI rollout looks like when your board asks
TLDR Data
S&P and Nasdaq rewriting index rules — halving IPO waiting periods, exempting mega-caps from profitability requirements — explicitly to accommodate SpaceX's mid-June IPO at >$1T valuation; passive capital will flow into AI automatically
The Information AM
◆ Bottom line
The take.
Private equity just captured the AI distribution channel for mid-market companies — $11.5 billion in deployment JVs with Blackstone, Goldman Sachs, and 19 other sponsors — in the same week Uber proved that agentic AI costs 3-4× what enterprise budgets assumed, and an Anthropic co-founder put 60%+ odds on autonomous AI R&D by 2028. The three-year plan is now a two-year plan, it costs more than you budgeted, and the distribution path to your customers is being bought by someone else. The organizations that secure PE-aligned distribution, stress-test AI budgets against revealed economics, and build internal recursive improvement loops this quarter will compound those advantages; the organizations that treat any of these as next quarter's problem will discover they were all this quarter's problem all along.
Frequently asked
- How should I rebuild AI budget assumptions given Uber's disclosed costs?
- Reset your baseline to $500–$2,000 per engineer per month for agentic coding tools, not the $50–$100 figure most 2024 plans used. That puts most organizations 4–10× under-budgeted, and a flat-rate contract structure absorbs the top decile of users who consume roughly 50× the median. Stress-test against a Q3 cash crunch and renegotiate toward consumption pricing with caps before vendors reprice unilaterally.
- What's the fastest way to capture the 17× cost arbitrage from open-weight models?
- Stand up a dual-stack deployment capability within 60 days: proprietary models for differentiated work, open weights for everything else. DeepSeek V4 and IBM Granite 4.1 (30B parameters, 512K context, Apache 2.0) are production-ready for non-differentiated workloads today. The architecture is well understood; the operational capability to route workloads between stacks is what most organizations still need to build.
- Why does PE-mediated distribution behave differently this time than past preferred-vendor lists?
- Sponsors with committed capital in a deployment JV have aligned incentives, reporting cadences, and defaults that historical preferred-vendor programs lacked. Anthropic's $1.5B JV with Blackstone, Hellman & Friedman, Goldman, and General Atlantic, plus OpenAI's $10B consortium, collectively reach tens of thousands of portfolio companies with pre-negotiated deployments. When a GP directs a portfolio CEO to deploy, the vendor evaluation is over before it begins.
- What's the gap between AI engineering and AI research, and why does it matter for the 2028 timeline?
- AI engineering—reproducing papers, shipping code, tuning training runs against specified targets—is largely achieved, while AI research requiring creative paradigm shifts remains uncertain. PostTrainBench at roughly 50% of human performance is the canary metric to watch. If it saturates on the 12–18 month curve that SWE-Bench and CORE-Bench followed, Clark's 60%+ probability by end of 2028 stops looking aggressive and starts looking conservative.
- What does the Five Eyes guidance signal about the regulatory timeline for AI agents?
- The pattern from prior zero-trust directives is joint guidance, then national standards, then procurement requirements, then audit expectations—compressed into roughly 18 months. The guidance deliberately mapped agents onto existing zero-trust, least-privilege, and defense-in-depth frameworks rather than inventing new ones, which signals a fast-ship posture. Pilots being scoped now will face these rules in production, so governance built in upfront is materially cheaper than retrofit.
◆ Same day, different angle
Read this day as…
◆ Recent in leader
Keep reading.
- Princeton's ICML 2026 paper finds that GPT 5.5, Gemini 3.1 Pro, and Claude Opus 4.7 are no more reliable on agent tasks than their predecess…
- GitHub disclosed 17 million agent-authored pull requests in a single month while Anthropic confirmed Claude writes 90%+ of its own code — an…
- Anthropic's Mythos cleared both UK AISI simulated attack ranges this week, a first, while TrustedSec demonstrated that all five major commer…
- Your EDR became structurally transparent this week.
- Anthropic's Mythos became the first AI model to fully take over both UK AISI attack ranges autonomously, and a parallel study showed AI reve…