Edition 2026-05-08 · read as Leader
MicrosoftKills'AIEverywhere'as$30ScansCrackCodebases
- Sources
- 42
- Words
- 1,304
- Read
- 7min
Topics Agentic AI AI Capital LLM Inference
◆ The signal
Microsoft killed its 'AI everywhere' strategy this week — rationalizing 81 products, axing Gaming Copilot, admitting customers called features 'functionally useless' — while AI-powered offensive security hit $30 per zero-day scan with 95% success rates in under 6 minutes. Your two most urgent recalibrations: triage the AI roadmap to margin-positive outcomes only, and assume your entire codebase is one commodity scan away from full exposure. The era of shipping AI as a feature flag just received its death certificate from the company with the most distribution on earth.
◆ INTELLIGENCE MAP
01 Microsoft's AI Retreat Validates Outcome-Only Thesis
act nowMicrosoft confirmed broad AI distribution destroys margins. 81 products rationalized, inference costs dragging earnings. Anthropic's opposite bet — per-result pricing, focused agent outcomes — grew revenue 80x. The 30-point gross margin gap between AI-native (50-60%) and traditional SaaS (80-90%) is structural, not transitional.
- Microsoft products cut
- AI gross margins
- SaaS gross margins
- Copilot user growth
02 Offense Commoditized at $30/Scan — Defenders Stuck at 55 Days
act nowZero-day discovery now costs $30-$150 per codebase. Red team agents achieve 95% domain dominance in under 6 minutes. Mozilla found 271 Firefox bugs in one Mythos pass. The defender's 55-day remediation average and 135 new CVEs/day mean the gap is mathematically unfixable with human-speed processes.
- Red team success rate
- Time to dominance
- Firefox bugs found
- Avg remediation
- Attack Time6
- Defense Time79200
03 Agent Load Breaks Infrastructure — GitHub at 85% Uptime
monitorGitHub's uptime fell to 85% — 2-3 hours daily downtime — under AI agent load 30x above architecture assumptions. CTO revised scaling target from 10x to 30x in 4 months. Competitors (GitLab, Vercel, Linear) absorbing same growth without failures. Mitchell Hashimoto publicly declared GitHub 'unfit for professional work.'
- GitHub uptime
- Daily downtime
- Load growth (2 yrs)
- Scaling target revision
- Architected For10
- Revised Target30
- Actual Load Pattern50
04 State AI Law Closes 'Algorithm Did It' Defense
monitorConnecticut passed an omnibus AI law (131-17 House, 32-4 Senate) explicitly removing automated decision-making as a defense in discrimination cases. Federal regulation most likely arrives via NDAA, not standalone bill — meaning compressed timelines and less debate. The 12-18 month window to shape vs. react is closing.
- Revenue threshold
- Fed regulation path
- NDAA streak
- Window to shape
- CT Law SignedQ2 2026
- CT EnforcementQ4 2026
- NDAA MarkupLate 2026
- Federal Framework2027
05 Per-Seat SaaS Pricing Enters Terminal Phase
backgroundStripe shipped 280 features for agentic commerce. HubSpot declared full API parity with UI as survival strategy. Anthropic moved to per-result pricing. When AI agents become primary SaaS consumers, per-seat models see higher utilization but flat revenue — a paradox with a 3-5 year fuse.
- Stripe features shipped
- Pricing model runway
- SI market at risk
- SI failure rate
◆ DEEP DIVES
01 The AI Feature Sprawl Death Certificate — And the Margin Math That Killed It
Microsoft Proved the Negative
Microsoft's Copilot rationalization is the most instructive strategic signal in enterprise AI this quarter. A company with unlimited frontier-model access, 400 million Office users, and effectively unlimited capital concluded that broad AI distribution destroys value. Customer feedback produced the phrase 'functionally useless.' The earnings call confirmed that inference costs drag margins. Eighty-one distinct products were in flight. Nadella's sequence was consolidation under one executive (Andreou), then killing everything that failed both a customer-value test and a unit-economics test.
If breadth-first AI feature sprawl does not work inside the largest software distribution on earth, the question of whether it works inside a smaller one answers itself.
The Structural Margin Gap
A reasonable skeptic would call this a maturity problem that scale resolves. The skeptic is wrong, and the numbers say so. BVP puts AI company gross margins at 50-60% against 80-90% for traditional SaaS. Reasoning models consume 10-100x the tokens of the prior generation for the same user-visible answer. OpenAI's 1,000x cost reduction over 14 months was eaten by its own model advances. Per-token cost falls. Tokens consumed per task rise faster. A company shipping AI on every surface is running thirty margin-negative line items to fund the one that pays for itself.
The Opposite Bet Is Working
Anthropic's shift to per-result pricing, charging for outcomes rather than tokens, is the structural alternative. It works because the agents complete the work, with a 90% autonomy target for Claude Code, which makes the vendor's eat-the-cost risk acceptable. Focused 365 Copilot, the part Microsoft is keeping, grew paying users 33%. Narrow, high-value surfaces where customers pay on purpose outperform broad feature spray by every measure that matters in year two.
Portfolio Implications
Every AI feature shipped on inference without corresponding willingness-to-pay is a standing cost against non-existent revenue. The audit is straightforward. Map every AI-powered feature to customer-perceived value and to inference cost. Anything that fails both tests is a margin leak that compounds with scale. Microsoft absorbed it for 18 months. Most organizations cannot absorb it for one quarter.
The era of competing everywhere with undifferentiated AI is over. Microsoft proved it does not work with infinite resources, which is useful to know before spending finite ones.
Action items
- Audit every AI feature against customer willingness-to-pay AND unit economics — kill or pause anything failing both tests by end of Q2
- Model per-result vs. per-token pricing for your top 5 AI use cases within 30 days
- Centralize AI product ownership under a single executive this quarter
- Establish inference cost budgets per feature — treated with the same seriousness as latency budgets
Sources:Aaron Holmes · 🔳 Turing Post · AI Weekly · The Information AM · TLDR IT
02 Your Codebase Is $30 From Full Exposure — The Offense-Defense Gap Is Now Unfixable at Human Speed
The Numbers That Changed
Three data points landed this week, and taken together they rewrite the economics of defense rather than nudging them:
- $30-$150 per codebase for AI-powered zero-day discovery (IronCurtain framework, open-weight models)
- 95% success rate for autonomous red team agents achieving domain dominance in under 6 minutes (Dreadnode Ares benchmark)
- 271 vulnerabilities found in Firefox, a mature and well-maintained project, in a single Mythos pass
The defender's side of the ledger reads differently. 55-day average remediation against 135 new CVEs per day leaves a running deficit of roughly 7,400 unpatched vulnerabilities at any given moment. Human-speed processes do not close that arithmetic.
When the attacker's economics and the attacker's clock are both on the same side of the ledger, and the defender's procurement cycle is still measured in quarters, the gap is structural, not operational.
The Model-Agnostic Problem
A reasonable skeptic would argue that restricting frontier models still buys time. The reasonable skeptic is already behind the evidence. Niels Provos has shown that older, widely available models with expert orchestration replicate frontier findings, and the UK AI Security Institute found GPT-5.5 (broadly available) may outperform Mythos (restricted) on cybersecurity tasks. Any security strategy that relies on attacker capability being gated by model access is pricing against a world that ended this quarter.
What Changed in Cloud Detection
The attack surface has migrated from exploits to abuse of legitimate APIs. The Mini Shai-Hulud worm crossed npm, PyPI, and Packagist in 48 hours, compromising official SAP and PyTorch Lightning packages. AWS Bedrock AgentCore's S3 access creates bidirectional C2 channels that AWS calls intended behavior. Traditional monitoring architectures are structurally blind to this class of attack, and language diversity in the stack no longer delivers security heterogeneity.
The Strategic Response
Google's VRP restructuring is the leading indicator worth watching. Rewards are dropping for bugs AI can find (commoditized) and rising for hardware exploitation and complex chains (up to $1.5M for Titan M2). The value is migrating from discovery to exploitation prevention and runtime protection. Security spend has to move off incremental improvement of legacy detection and toward AI-native defense that operates at machine speed, because the alternative is a budget defending last quarter's perimeter against this quarter's economics.
Layer Old Assumption New Reality Discovery Scarce, expensive $30, commodity Exploitation Requires expertise 95% automated Remediation Manageable queue 55-day structural gap Detection Network monitoring Blind to API abuse Action items
- Deploy AI-powered offensive testing against your own infrastructure at the $30-150/codebase price point using IronCurtain or equivalent — this week, not next quarter
- Conduct a 'patch wave readiness' assessment: model whether your engineering org can handle 5-10x current CVE volume with 3-day response windows
- Commission an AI agent infrastructure audit — inventory all exposed AI endpoints (MCP servers, Ollama instances, agent sandboxes) and implement VPC isolation
- Present autonomous security investment thesis to the board — frame as operating model change, not tooling purchase
Sources:Clint Gibler · Risky.Biz · TLDR InfoSec · SANS AtRisk · The Hacker News
03 GitHub's 85% Uptime Is a Preview — Your Products Face the Same 30x Agent Load
GitHub's 85% Uptime and the Agent Load Miss
GitHub's measured uptime fell to 85%, which works out to two or three hours of downtime a day. CTO Vlad Fedorov attributed the shortfall to AI agent load being 'much bigger than expected' and disclosed that GitHub revised its scaling target from 10x to 30x in four months. A single developer running Claude Code or Codex now generates the load previously associated with 10-50 developers. Agents run flat around the clock, parallelize aggressively, ignore rate-limit etiquette, and convert failed requests into more requests rather than backing off.
This Is a GitHub Failure, Not an Industry Condition
GitLab, Bitbucket, Vercel, Linear, Railway, and Sentry are absorbing equivalent AI-driven growth without catastrophic reliability failures. Google's SRE teams were preparing for 10x code increases in July 2025, a full year before GitHub's crisis became public. The gap between Google's anticipation and GitHub's surprise is the cleanest available read on which organizations internalized agent-era scaling and which are reacting to it.
Mitchell Hashimoto, HashiCorp founder and 18-year GitHub user, publicly declared it 'unfit for professional work.' That is less a single defection than a permission structure for enterprise procurement conversations that were previously unthinkable.
The Mirror Problem: Portfolio Products Under Agent Load
The harder question is whether products shipped from the same portfolio are architected for the equivalent 30x load explosion. The customers running agents against GitHub are the customers running agents against every other API they touch. A platform that passes the GitHub dependency audit and fails the 'can our own product handle agent traffic' audit has simply moved the outage from a vendor's status page to its own.
Agent Traffic Is a Different Traffic Class
- No diurnal pattern — load is flat 24/7
- Aggressive retry behavior — failed requests multiply rather than queue
- High parallelization — one user generates 10-50x historical load
- No rate-limit politeness — agents hit limits then spawn parallel paths
Every platform sitting in the AI-agent consumption chain was built against human usage patterns. Most will meet the new profile the same way GitHub did: by degrading first and re-architecting second. The strategic question is which products get re-architected before the load arrives and which get re-architected after. The two paths do not cost the same, and they do not look the same to customers twelve months from now.
The security overlay compounds the problem. A critical vulnerability allowing full repository access via git push overlapped with the reliability failures, creating compound risk that should trigger immediate enterprise security review for any on-premises GitHub Enterprise deployment.
Action items
- Audit GitHub dependency across your engineering organization — map which workflows block when GitHub degrades and quantify productivity loss at 85% uptime
- Stress-test your own product architecture against 30x current load with AI agent traffic patterns (high-frequency, programmatic, burst-heavy) by end of Q2
- Evaluate multi-vendor source control strategy or self-hosted alternatives (GitLab, Forgejo) for critical repositories and CI/CD pipelines
- Watch GitLab (GTLB) and developer infrastructure startups for acquisition or partnership as GitHub's crisis creates rare platform-switching opportunity
Sources:The Pragmatic Engineer · The Information AM · The Download from MIT Technology Review
◆ QUICK HITS
Update: Anthropic reported $30B ARR (80x Q1 growth) — leasing 100% of xAI's Colossus 1 at ~$5B/year because even Google Cloud + AWS cannot supply enough inference compute
The Information AM
Google licensing Gemini through PE firms (Blackstone, KKR, EQT) for portfolio-wide deployment — trading per-deal margin for distribution velocity across thousands of enterprises
TLDR AI
Uber exhausted its entire 2026 AI budget by mid-April and is now cannibalizing hiring budgets to fund AI compute — confirms AI is displacing headcount as primary opex growth category
The Information AM
Meta's internal AI token leaderboard gamed within weeks — engineers scripted millions of tokens for no productive purpose, proving input metrics for AI adoption are fundamentally ungovernable
Engineer's Codex
RAG accuracy collapses from 90.7% to 50.6% when corpus scales from 5K to 500K documents — the demo was never the product, knowledge-graph architectures emerging as successor
Daily Dose of DS
Palo Alto Networks zero-day (CVE-2026-0300): unauthenticated RCE as root on PA-Series/VM-Series firewalls — no patch until May 13, CISA confirms active exploitation
CyberScoop
DPRK IT worker fraud industrialized at scale — 70+ companies (including Fortune 500) infiltrated via remote engineering roles, $1.2M generated from just two caught facilitators
CyberScoop
Pinterest hit first $1B quarter on 80B monthly visual searches — 24% higher conversion than social engagement, proving commercial-intent data is the new ad moat
TLDR Design
Google's WebMCP protocol turns every website into a callable service for AI agents — 6-12 month first-mover window before it becomes table stakes
TLDR Marketing
AI productivity gains plateau at 6 months in organizations that don't redesign operating models — the honest budget split is 50/50 tech vs. org redesign, most run 90/10
TLDR Data
◆ Bottom line
The take.
Microsoft just proved that distributing AI features broadly destroys margins even with unlimited resources, while AI-powered offense hit $30 per zero-day scan with 95% automated success rates — and the platforms your engineering relies on (GitHub at 85% uptime) are breaking under agent load nobody planned for. The three recalibrations this quarter are: triage every AI feature for margin contribution, assume your codebase is already scanned by commodity tools, and stress-test your infrastructure against 30x the load you architected for.
Frequently asked
- What does Microsoft's Copilot rationalization actually mean for our AI roadmap?
- It means breadth-first AI feature distribution destroys margin even at infinite scale, so every AI feature should be triaged against two tests: customer willingness-to-pay and unit economics. Microsoft consolidated 81 products under one executive and killed anything failing both. With AI gross margins at 50-60% versus 80-90% for traditional SaaS, features that fail both tests are standing margin leaks that compound with usage.
- Why isn't restricting frontier model access enough to slow down AI-powered attackers?
- Because attacker capability is no longer gated by frontier access. Older, widely available models with expert orchestration now replicate frontier findings, and the UK AI Security Institute found broadly available GPT-5.5 may outperform restricted Mythos on cybersecurity tasks. Combined with $30 zero-day scans and 95% autonomous red-team success rates, any defense strategy assuming model gatekeeping buys time is already obsolete.
- How should we think about per-result versus per-token pricing for AI products?
- Per-result pricing transfers inference-cost risk to the vendor and is only viable when agents actually complete the work, which is why Anthropic targets 90% autonomy for Claude Code before charging for outcomes. Enterprise buyers increasingly reject per-seat and per-token models because they cannot forecast spend. Modeling both pricing structures across your top use cases is now a competitive necessity, not a finance exercise.
- Is GitHub's reliability problem an industry condition or a company-specific failure?
- It is a company-specific failure with industry-wide implications. GitLab, Bitbucket, Vercel, Linear, Railway, and Sentry are absorbing comparable AI-agent growth without catastrophic outages, and Google's SRE teams were planning for 10x code increases a full year before GitHub's crisis surfaced. The implication for your portfolio is that agent traffic — flat 24/7, aggressively parallel, retry-heavy — will hit your own APIs the same way, and architectures built for human usage patterns will degrade before they re-architect.
- What's the single most urgent action this week?
- Run AI-powered offensive testing against your own infrastructure at the $30-150 commodity price point using IronCurtain or an equivalent framework. If the tooling is available to consultancies and adversaries at that price, the only question is whether you find the vulnerabilities first. Everything else — pricing model audits, agent-load stress tests, roadmap triage — can wait a sprint. This cannot.
◆ Same day, different angle
Read this day as…
◆ Recent in leader
Keep reading.
- Princeton's ICML 2026 paper finds that GPT 5.5, Gemini 3.1 Pro, and Claude Opus 4.7 are no more reliable on agent tasks than their predecess…
- GitHub disclosed 17 million agent-authored pull requests in a single month while Anthropic confirmed Claude writes 90%+ of its own code — an…
- Anthropic's Mythos cleared both UK AISI simulated attack ranges this week, a first, while TrustedSec demonstrated that all five major commer…
- Your EDR became structurally transparent this week.
- Anthropic's Mythos became the first AI model to fully take over both UK AISI attack ranges autonomously, and a parallel study showed AI reve…