Product daily

Edition 2026-05-13 · read as Product

HybridAIPricingHits37%:InstrumentOutcomesBeforeQ2

Sources
38
Words
1,648
Read
8min

Topics LLM Inference Agentic AI AI Regulation

◆ The signal

Kyle Poyar's survey of 230 enterprise software firms shows hybrid pricing (subscription + outcome/usage) jumped from 25% to 37% adoption in a single year, with pure outcome-based projected to hit 31% by mid-2029 — and FedEx's procurement team is already rejecting vendors who can't answer 'what happens to the invoice when the AI does the work instead of the human.' The sprint decision isn't whether to switch pricing models. It's whether your product can measure and attribute the outcomes your AI features produce well enough to defend an invoice by Q2. Instrument your top 3 outcome events before the full report drops May 14.

◆ INTELLIGENCE MAP

  1. 01

    Outcome-Based Pricing Requires Instrumentation This Sprint

    act now

    Hybrid pricing moved 25%→37%→47% projected. Pure outcome-based moves 5%→31%. Monday.com went from 20% headcount growth to flat in one quarter, shipping usage-based AI credits with per-user tracking. ServiceNow's COO says outcome measurement is 'contractually impossible' while FedEx's CDIO says vendors are already doing it. The gap: you can't price what you can't measure.

    37%
    hybrid pricing adoption
    5
    sources
    • Hybrid now
    • Hybrid 2029
    • Pure outcome 2029
    • Monday headcount
    • Monday stock YTD
    1. Pure Outcome (Now)5
    2. Pure Outcome (2029)31
    3. Hybrid (Now)37
    4. Hybrid (2029)47
  2. 02

    Sub-200ms Full-Duplex Kills Turn-Based Voice UX

    monitor

    Thinking Machines shipped TML-Interaction-Small: 276B params (12B active MoE), sub-200ms continuous multimodal input/output, eliminating VAD-based turn detection entirely. Beats GPT-Realtime-2 and Gemini 3.1-Flash on benchmarks. John Schulman: tasks that needed special-purpose systems become zero-shot when the type signature is continuous audio+video+text. Any voice feature assuming turn-taking has a two-quarter shelf life.

    200ms
    interaction latency
    6
    sources
    • Model params (total)
    • Active params (MoE)
    • Latency
    • Old paradigm
    1. Turn-Based (Current)800
    2. Full-Duplex (TML)200
  3. 03

    Supply Chain Attacks Hit AI Development Stack Simultaneously

    act now

    Three concurrent supply chain attacks: 84 TanStack npm packages compromised (12M+ weekly downloads), Ollama CVE-2026-7482 exposing 300K servers' heap memory via 3 API calls, and a fake HuggingFace 'OpenAI Privacy Filter' repo hitting 244K downloads before detection. TeamPCP campaign has been working through CI/CD tools since February. Trusted publishing provided zero protection.

    300K
    exposed servers
    6
    sources
    • TanStack packages
    • Ollama servers exposed
    • Fake HF downloads
    • TeamPCP duration
    1. Ollama Servers300
    2. Fake HF Downloads244
    3. TanStack Packages84
    4. npm Worm Versions373
  4. 04

    SDK/API Layer Becomes Contested Platform Territory

    monitor

    Anthropic is acquiring Stainless for $300M+ — the SDK generator used by OpenAI and Google. Simultaneously, AI agents are becoming first-class API consumers alongside human developers. The layer between models and developers is no longer commodity plumbing — it's strategic infrastructure. A 4-year-old dev tools startup commanding $300M signals the spec layer is where leverage accrues.

    $300M
    SDK acquisition price
    5
    sources
    • Stainless age
    • Clients at risk
    • Codex Skills pattern
    • Agent persona
    1. 01Anthropic (Stainless)$300M
    2. 02OpenAI (Tomoro)$4B+
    3. 03Anthropic (PE JV)$1B+
  5. 05

    AI Code Output ≠ Productivity — The Gap Is Quantified

    background

    Pragmatic Engineer confirms: AI generates 100x more code output but ~1x actual productivity gain. Amazon mandated 80%+ AI tool adoption and staff gamed MeshClaw token leaderboards. A Kubernetes dashboard vibe-coded with AI required a full Rust rewrite. Shopify's River agent only works in public channels — optimizing for org learning, not individual velocity. The metric that matters: code still in main after 30 days.

    100x
    code output vs ~1x outcome
    7
    sources
    • Code output gain
    • Productivity gain
    • Amazon mandate
    • Harness cost spread
    1. AI Code Output100
    2. Actual Productivity1

◆ DEEP DIVES

  1. 01

    Outcome-Based Pricing: The Measurement Sprint That Decides Your 2027 Revenue Model

    The Market Moved. The Telemetry Didn't.

    A pricing manager at a mid-market SaaS company opened her per-seat line item three times last week. The buyer on last month's call had asked her a question she couldn't answer: what happens to the seat count when the AI agent does the work instead of the human? She is not stuck. She is waiting for the first competitor to move. Kyle Poyar's survey of 230 enterprise software firms suggests that move is weeks away, not quarters.

    Hybrid pricing moved from 25% adoption to 37% in a single year. Pure outcome-based is projected to jump from 5% to 31% by mid-2029. The full report drops May 14.

    Two Executives, Two Different Problems

    ServiceNow COO Amit Zavery says outcome measurement is "contractually impossible" because a contract cannot define what the outcome would have been. FedEx CDIO Vishal Talwar says vendors are already doing it, tied to business metrics FedEx wants to hit. Zavery is describing a product problem. Talwar is describing a sales problem a vendor already solved for him. The team that ships outcome attribution infrastructure in the next 18 months walks into CFO offices with "your AI completed 10,000 tasks worth $X each" while competitors defend flat fees.

    Monday.com Is Showing the P&L Shape in Public

    Monday.com moved from 20% planned headcount growth to flat in under 6 months, explicitly citing AI productivity gains. Revenue growth decelerated from 27%+ to 19-20%. Stock is down 48% YTD. The CRO announced customers can now see which employees consume AI credits. That is the metering, attribution, and billing substrate for usage-based AI pricing being built in production. A team shipping AI features without that instrumentation is not shipping a feature with a missing dashboard. They are shipping a feature they cannot reprice later without a migration.


    The 2x2 for This Sprint

    Buyer procurement asks for outcomesBuyer hasn't asked yet
    Product measures outcomesPrice on outcomes NOW — charge more than feels comfortableMove to outcomes anyway — 37% becomes 50% next year
    Product only measures activityShip subscription + cap, publish the cap publicly (FedEx buyers will ask)Instrument outcomes this quarter — you're building the pricing architecture for 2027

    A third of software firms say outcome-based prices will be hard for customers to forecast. 20% worry it won't expand revenue fast enough. Those are real concerns. They are also engineering problems with known solutions, not structural blockers. The cell to avoid is the one most roadmaps drift into by default: shipping outcome pricing before the telemetry exists to defend the invoice. That conversation ends in a credit memo.

    Action items

    • Identify and instrument your top 3 outcome events (e.g., ticket resolved, contract generated, invoice approved) in the current sprint
    • Model revenue impact of three pricing scenarios: current per-seat, hybrid subscription + usage cap, and pure outcome-based by end of May
    • Interview 5 enterprise customers about which AI outcomes they'd pay for and how they'd measure them before Q3 planning

    Sources:Laura Bratton · TLDR Product · US AI in the Enterprise · Martin Peers · Benedict Evans

  2. 02

    Three Simultaneous Supply Chain Attacks — Your AI Build Pipeline Is the Target

    The Scope Is Not One Incident — It's Three

    A developer pulled a routine dependency update last week. The build went green. Nothing looked wrong. Across the ecosystem this week, three simultaneous supply chain campaigns are harvesting credentials from AI development stacks specifically:

    1. TanStack npm compromise: 84 malicious package versions across 42 packages (12M+ weekly downloads). A Bun-based worm exploiting optionalDependencies + prepare hooks to steal GitHub tokens, npm tokens, CI credentials, and cloud secrets. Trusted publishing provided zero protection.
    2. Ollama CVE-2026-7482: 300,000 exposed servers leaking heap memory — user prompts, system prompts, API keys, and customer contracts — via just 3 API calls. No authentication required.
    3. Fake HuggingFace repo: "OpenAI Privacy Filter" hit #1 on the platform with 244K downloads before identification as an infostealer linked to the Silver Fox/ValleyRAT campaign.
    The TeamPCP campaign has been systematically compromising CI/CD security tools for 3+ months — Trivy → GitHub Actions → OpenVSX → Jenkins — with each attack building on credentials stolen from the previous one.

    Why This Hits AI Teams Harder

    AI development stacks have uniquely broad attack surfaces: model weights pulled from registries without signing verification, CI/CD pipelines with production credentials running third-party plugins, and local inference servers (Ollama) that graduated from prototype to production without security review. The mental model "local equals secure" is now disproven in three API calls.

    The Specific Checkmarx Jenkins Action

    The Checkmarx Jenkins AST Scanner plugin version 2026.5.09 (published May 9) contains a confirmed backdoor. Roll back to 2.0.13-829.vc72453fa_1c16 from December 2025 immediately. SOCRadar confirmed the compromise. Auto-updating CI/CD plugins are a single point of failure for everything downstream.


    The Fix Has Two Layers

    Immediate (this week): Audit dependencies against TanStack affected versions. Check for Ollama instances anywhere in your stack. Verify HuggingFace model provenance. Rotate all CI/CD tokens and cloud secrets if any match is found.

    Architectural (this quarter): Pin all CI/CD plugins to reviewed versions with hash verification. Treat model artifacts with the same rigor as code dependencies — provenance verification in the pipeline. Separate build-time from production credentials. GitHub Actions defaults systematically trade security for convenience — audit for injection paths and unnecessary trigger permissions.

    Action items

    • Run emergency dependency audit against TanStack packages and Ollama instances across all environments — rotate exposed credentials immediately
    • Verify Checkmarx Jenkins AST Scanner is not at version 2026.5.09 — roll back to 2.0.13-829 if present and rotate all Jenkins runner secrets
    • Add model provenance verification to your ML pipeline acceptance criteria — no HuggingFace model enters build without hash verification
    • Pin all CI/CD plugins to reviewed versions with SHA verification and separate build-time from production credentials by end of quarter

    Sources:TLDR · TLDR InfoSec · Daniel Miessler · SANS NewsBites · TLDR Dev · Techpresso

  3. 03

    Sub-200ms Full-Duplex: The Interaction Model That Makes Turn-Based AI Look Dated

    What Shipped This Week

    Thinking Machines released TML-Interaction-Small, a 276B parameter model (12B active via MoE) that handles images and audio in under 200ms using encoder-free early fusion, emitting "time-aligned microturns" instead of waiting for the user to finish a sentence. It beats GPT-Realtime-2 and Gemini 3.1-Flash on BigBench Audio, IFEval, and FD-bench. John Schulman put it plainly: tasks that previously needed special-purpose systems become zero-shot when the type signature is continuous audio+video+text → audio+text.

    When round-trip latency was 800ms, a walkie-talkie model was the honest design. Under 200ms, the model listens while it speaks. The user interrupts without the system losing its place. That is a different product.

    The Architecture Pattern Worth Stealing

    Here is what users actually do with a voice assistant: they interrupt, they talk over it, they restart mid-sentence. Here is what product decks assume they do: wait politely through a 5-15 second response. The dual-model design closes that gap. A lightweight foreground model carries conversational presence at sub-second latency. An asynchronous background model handles the heavier reasoning. A single-model round-trip is now the thing that needs a justification, not the default shape to start from.

    Simultaneously: OpenAI Realtime Translate

    OpenAI shipped three Realtime models the same week: Realtime 2 (voice-to-voice), Realtime Translate (70 input → 13 output languages), and Realtime-Whisper for live STT. Real-time audio translation is now a commodity API call at $0.0X/minute. The roadmap question is no longer whether to build it. It is which markets that failed the localization business case six months ago clear the bar today.


    Which Features Are Under Threat

    Content-Driven ValueFlow-Driven Value
    Turn-Based OKDictation, command, single-shot Q&A ✓⚠️ Tutoring, coaching, support — UNDER THREAT
    Full-Duplex RequiredLive translation, meeting assistTherapy, sales calls, pair programming — NEXT WAVE

    The pattern worth planning around: a continuous interaction layer for real-time exchange, paired with autonomous agents doing the slow work in the background. The metric has to change with the architecture. Session length goes up when the UX gets worse and down when it gets better. Time-to-first-useful-response and interruption recovery rate are the numbers to steer by.

    Action items

    • Audit your voice/multimodal features for turn-based assumptions — flag which features would be fundamentally different with continuous-time interaction
    • Prototype the dual foreground/background model pattern for your highest-engagement AI feature this quarter
    • Evaluate OpenAI Realtime Translate API for your highest-traffic multilingual segment — scope a prototype sprint

    Sources:AINews · StrictlyVC · TLDR AI · TLDR Dev · Daily Dose of DS · ben's bites

  4. 04

    Anthropic Buys Stainless for $300M — The API Layer Is No Longer Neutral

    What Was Acquired and Why It Matters

    Anthropic is buying Stainless, a 4-year-old startup that generates typed SDKs for API products, for at least $300 million. Stainless also powers API access for OpenAI and Google. The thing being pitched is "developer experience." The thing being done is pulling a shared dependency out from under two direct competitors.

    What a developer actually does with a model provider is generate a typed SDK in their language of choice, paste it into a service, and never look at it again. The model gets swapped every eighteen months. The SDK gets touched every day. Owning the part that gets touched every day is worth $300M.

    The Agent-as-Consumer Shift

    Agents like Claude Code and OpenClaw now write the calls humans used to write. They need fast, reliable programmatic access and they serve a second persona that operates at machine speed and switches providers the moment an error message is unclear. Anthropic decided owning that interface layer was worth $300M rather than renting it. Codex Skills (one-click installable agent capabilities) and Notion Skills (database-as-app-store with two-way agent sync) show the same pattern from a different angle. The agent skill registry is becoming a platform primitive.

    What This Means for Your API Surface

    For any product exposing an API, the useful question is whether an agent can discover, parse, and compose the capabilities without a human in the loop. The sub-10% non-programmer Skill setup rate that a16z's Olivia Moore flagged on Claude is the UX gap in plain sight. Teams that ship agent-consumable interfaces in the next 2-3 quarters will sit where mobile-first products sat in 2010: early to a surface most competitors are still treating as secondary.


    The Dependency Audit

    For teams currently using Stainless to generate SDKs, the vendor is now owned by one of the three model providers they support. The migration question is not theoretical. It is whether Anthropic will keep shipping neutral, high-quality SDK support for OpenAI and Google after close. The safe move is to make the OpenAPI spec good enough that any generator produces a client worth shipping. The spec is the product. The generator is replaceable.

    Action items

    • Audit your SDK/API dependency on Stainless or similar generators — document switching cost in engineer-hours and create contingency plan
    • Add 'AI agent' as an explicit API consumer persona in your next API design review — audit for machine-readability, structured outputs, and webhook support
    • Evaluate whether your product should ship as a Codex Skill or Notion Skill — assess net-new distribution vs. cannibalization of existing channels

    Sources:The Information · The Information AM · Benedict Evans · TLDR AI · ben's bites

◆ QUICK HITS

  • Anthropic ARR jumped from $9B to $45B in ~5 months (80x annual pace vs 10x plan) — they leased xAI's entire 300MW Colossus datacenter for inference because Claude Code demand exceeds GPU supply

    Benedict Evans

  • Update: OpenAI Deployment Company now has specific context — 150 FDEs from Tomoro acquisition, Palantir-style embed model, and labs are restricting latest models to deployment partnership customers first

    The Information AM

  • ChatGPT launched product feed ads with CPC bidding, replicating Google Shopping's architecture — same product data powers organic answers and paid placements, CPA pricing in development

    MarketingShot

  • Ramp trained a small RL model using Prime Intellect that beats Opus by 4% on exact match accuracy at Haiku latency for spreadsheet Q&A — validates custom small models over frontier API calls for narrow domains

    ben's bites

  • Amazon mandated 80%+ weekly AI tool usage and staff are gaming MeshClaw token leaderboards — Goodhart's Law in real-time; replace volume KPIs with outcome metrics before your dashboard lies to you

    Techpresso

  • Coding Agent Index: >30x cost variation across model+harness pairs for same task — the harness (prompting, caching, tool selection) creates more variation than model choice in most deployments

    AINews

  • AI citation benchmark: only 2-2.5% of URLs appear across all 3 AI engines (ChatGPT, Perplexity, Google AI Overviews) — 91% appear in just one; treat as 3 separate distribution surfaces

    TLDR Marketing

  • Colorado's comprehensive AI law takes effect June 2026 — any product using AI in consequential decisions (employment, lending, insurance) for Colorado users triggers compliance obligations

    a16z AI Policy Brief

  • Spotify verification now requires 3 criteria: consistent listeners (not raw streams), policy compliance, AND identifiable off-platform presence — the third criterion is what AI-generated entities can't cheaply manufacture

    TLDR Design

  • Google auto-links Ads to YouTube channels on June 10 — advertisers get default access to organic video engagement data as targeting signals; configure accounts correctly before the switch

    MarketingShot

◆ Bottom line

The take.

Your AI features need to answer three questions this week that they couldn't dodge last week: Can you measure the outcome well enough to price it (37% of the market already can)? Can you prove your build pipeline wasn't compromised (300K Ollama servers and 84 TanStack packages say probably not without checking)? And is your voice/multimodal architecture ready for the moment a competitor ships sub-200ms full-duplex interaction against your turn-based features? The common thread: instrumentation is the new moat — teams that can measure outcomes, verify supply chains, and track interaction quality own the pricing conversation, the security conversation, and the product conversation simultaneously.

— Promit, reading as Product ·

Frequently asked

How do I instrument outcome events when my product was built around per-seat metering?
Start by picking three high-signal completion events your AI feature produces — like ticket resolved, contract generated, or invoice approved — and log them with a stable event ID, timestamp, customer ID, and a confidence or attribution score. You don't need a billing system change to begin; you need the audit trail that lets you defend an invoice line later. The metering substrate Monday.com is exposing publicly (per-employee AI credit consumption) is the same primitive in production form.
What's the right answer when procurement asks what happens to the invoice when AI replaces the human?
The defensible answer pairs a measured outcome with a price per outcome, plus a floor that covers your platform cost. If you can't measure the outcome yet, the honest interim answer is a subscription with a usage cap you publish — not a flat per-seat fee that buyers like FedEx are now rejecting. ServiceNow's 'contractually impossible' framing is losing to vendors who simply tied price to a business metric the buyer already tracks.
Should I prioritize outcome instrumentation or full-duplex voice architecture this quarter?
Outcome instrumentation, unless voice is your core interaction surface. Pricing-model defensibility affects every deal cycle in the next two quarters, while full-duplex is a 2-3 quarter competitive threat concentrated in flow-driven features like tutoring, support, and coaching. If you ship voice, prototype the dual foreground/background pattern in parallel — but the invoice conversation is the one happening in live deals right now.
How exposed is my AI build pipeline to the current supply chain attacks?
Likely more exposed than a typical web stack, because AI pipelines pull model weights, npm packages, and CI plugins from registries with weaker provenance verification. Immediate checks: TanStack package versions, Ollama instances anywhere in the stack, Checkmarx Jenkins AST Scanner version 2026.5.09, and any HuggingFace model pulled without hash verification. If any match, rotate CI/CD tokens and cloud secrets before the next build runs.
Does the Stainless acquisition mean I should migrate off their SDK generator?
Not immediately, but treat your OpenAPI spec as the durable asset and the generator as replaceable. Document the switching cost in engineer-hours now so the decision is ready when neutrality questions surface. The strategic takeaway is broader: agents are becoming primary API consumers, and the team that designs for machine-readable discovery and composition in the next 2-3 quarters captures the same kind of advantage mobile-first products had in 2010.

◆ Same day, different angle

Read this day as…

◆ Recent in product

Keep reading.