How should I model the new per-action agent fees in my AI feature business cases?

Add 'agent tollgate cost' as a mandatory line item tied to each external platform call, and instrument every agent workflow with a cost tag before your next finance review. A workflow touching ServiceNow, Workday, and HubSpot now incurs triple metering on top of inference costs, so per-call telemetry is what lets you negotiate rates or reroute traffic before the first new invoice lands.

Which AI features are most exposed to the N.D. Cal. 'ultimate authority' ruling?

Any feature where the model produces customer-facing content without meaningful human review — recommendations, financial summaries, automated outreach, support replies, ad assembly. The ruling makes confidence thresholds, auto-publish rules, and review steps part of your liability surface, so they need to be audited as architecture decisions rather than UX defaults.

Is it actually safe to route production traffic to Grok 4.3 given the price gap?

Safe enough to test on non-critical paths like summarization, classification, and support routing, where a ~50% input cost reduction is material. Run the same eval set across Codex, Opus 4.7, and Grok 4.3 and compare time-to-first-useful-output and revision count — those two metrics decide whether quality holds for your workloads, not token throughput or session length.

What's the right response to Codex's one-click import from Claude Cowork?

Treat zero switching cost as the new user expectation and build import/export parity into your own product this sprint. Configuration lock-in via settings, plugins, and project config is no longer a moat, so make your configuration portable before a competitor offers a one-click migration away from you.

Why does empathy tuning matter for product accuracy, not just tone?

Oxford Internet Institute research found that models tuned to soften difficult truths produce more incorrect answers, with error rates peaking when the user is upset. A single 'be helpful and empathetic' system prompt pulls against accuracy in exactly the high-stakes sessions where wrong answers cause the most damage, which is why warmth and bluntness need to be a routed dial rather than a global default.

Edition 2026-05-06 · read as Product

EnterpriseAITollgatesTripleAgentIntegrationCOGS

Sources: 38
Words: 1,312
Read: 7min

Topics Agentic AI LLM Inference AI Capital

◆ The signal

Five enterprise platforms — ServiceNow, SAP, Workday, HubSpot, and DataDog — simultaneously added per-action tollgates for AI agent access this week, creating a new cost layer that sits underneath every agentic feature on your roadmap. Your integration COGS just tripled for workflows that touch multiple vendors, and nobody's P&L model accounts for it yet. Instrument every external agent call with a cost tag before your next finance review — the teams with per-call economics data will negotiate from strength while everyone else negotiates from a CFO's email.

Key facts

ServiceNow, SAP, Workday, HubSpot, and DataDog all added per-action tollgates for AI agent access in the same week, with DataDog capping MCP requests at 5,000 daily and 50,000 monthly.
JPMorgan analyst Mark Murphy described the new platform metering as 'essentially a tax on customers using outside AI agents,' while SAP is considering banning unauthorized external agents on its $200B platform.
A N.D. Cal. court ruled that when AI exercises 'ultimate authority' over assembled ad content, the platform qualifies as the speaker under SEC Rule 10b-5.
The Oxford Internet Institute found that LLMs tuned to soften difficult truths produce more incorrect answers, with error rates peaking when users are upset.
Grok 4.3 launched at $1.25/$2.50 per million input/output tokens with 1M context, while Base44's Frustration Meter recorded 43% more user frustration on Opus 4.7 versus Opus 4.6.

◆ INTELLIGENCE MAP

01
Enterprise 'Agent Tax' Creates Triple-Metering Crisis
act now
ServiceNow's Action Fabric meters per-action, DataDog caps at 5K daily requests, SAP may ban unauthorized agents entirely. A single agentic workflow touching 3 platforms now faces triple metering on top of model inference costs. JPMorgan's Mark Murphy calls it 'essentially a tax on customers using outside AI agents.'
3x
cost layer multiplier
6
sources
- DataDog daily cap
- DataDog monthly cap
- SAP platform value
- Anthropic connector
1. 01SAPBan unauthorized agents
2. 02ServiceNowPer-action metering
3. 03WorkdayAgent access fees
4. 04HubSpotData access tolls
5. 05DataDog5K/day cap (transparent)
02
AI Autonomy Is Now a Legal Liability Spectrum
monitor
N.D. Cal. ruled that when AI exercises 'ultimate authority' over assembled content, the platform is the liable speaker under Rule 10b-5. Simultaneously, Oxford proved empathetic AI gives worse answers to vulnerable users. Every AI feature now sits on a spectrum where more autonomy = more liability and more warmth = less accuracy.
33%
AI misrepresentation rate
4
sources
- Empathy error increase
- AI summaries wrong
- Daily chatbot users
- AI romantic relationships
1. AI triage accuracy67
2. Human doctor triage53
03
AI Workspace War Enters Phase Two: Price Disruption + Platform Lock-in
monitor
OpenAI Codex launched one-click import from Claude Cowork plus slides/sheets for non-technical users. Grok 4.3 priced at $1.25/M input tokens (half Sonnet 4.6). Opus 4.7 shows 43% more user frustration. GPT-5.5 effective cost is up 49-92%. The workspace category is fragmenting on price and consolidating on features simultaneously.
$1.25
Grok per-M input tokens
4
sources
- Grok input tokens
- Grok output tokens
- GPT-5.5 cost increase
- Opus 4.7 frustration
1. Grok 4.31.25
2. Sonnet 4.62.5
3. GPT-5.5 (low)3.73
4. GPT-5.5 (high)4.8
04
The PM Role Compresses: Spec Precision Becomes the Bottleneck
background
Coinbase eliminated the PM/design/eng trio for AI-powered 'one-person teams.' A16z speedrun founders report that hundreds of agents coded fast but wrong — single well-instructed agents one-shot work. The variable is spec clarity, not token volume. Panorama's pivot: more tokens = less PM precision.
$300K
annual agent spend vs. hire
3
sources
- Coinbase layers cut to
- Coinbase layoffs
- Speedrun agent budget
- Productivity gain
1. Agent spend (annual)300
2. Engineer (loaded)300

◆ DEEP DIVES

01
The Agent Tax Is Here: Five Platforms Just Metered Your Integration Layer
What Changed This Week
A product manager building an agentic workflow this week discovered her COGS model was wrong. ServiceNow's Action Fabric announcement formalized what five platforms are doing at once: charging per-action when an external AI agent touches enterprise data. JPMorgan's Mark Murphy called it 'essentially a tax on customers using outside AI agents.' DataDog set transparent caps at 5,000 daily and 50,000 monthly MCP requests. SAP is floating a ban on unauthorized external agents against a $200B platform. Workday and HubSpot added their own access fees. No AI feature business case written before this week has a line item for any of it.
A single agentic workflow that touches three enterprise platforms now faces triple metering — on top of model inference costs that were already straining budgets.
The Split Between Incumbents and Cloud-Native
AWS CEO Matt Garman warned that protectionist incumbents 'could get into trouble.' What teams tell themselves is that the tollgates are about governance. What the pricing actually tracks is switching cost. SAP can ban because migration is a multi-year, multi-million-dollar project. ServiceNow and Workday sit in the middle, sticky enough to charge and not sticky enough to ban. DataDog is gentlest because the alternatives are real. Cloud-native and AI-native players are conspicuously not charging, and selling the absence as the product.
Anthropic Already Cut Preferred Deals
Claude Cowork has a dedicated connector to ServiceNow's Agent Fabric. Anthropic is cutting preferential integration deals with tollgated platforms. The competitive landscape for agent products will be decided partly on which partnerships get signed before these programs harden. If Claude gets preferred ServiceNow access and a competing product does not, that is a product gap, not a marketing one.
The 2x2 That Tells You Whether to Panic
One axis: does the product orchestrate actions inside someone else's platform, or inside its own data. Other axis: does the customer pay for outcomes, or for usage. Orchestrate-in-someone-else plus pay-for-outcomes breaks first because COGS is variable and revenue is not. Orchestrate-in-own-data plus pay-for-usage is the cell with pricing power. The other two cells are survivable with repricing.
Two Openings in the Gap
- Build the tool that helps enterprises measure and optimize agent interaction cost across all tollgated platforms
- Be the platform that conspicuously does not charge an agent tax and sells that silence as the differentiator
Action items
- Audit every agent workflow that touches ServiceNow, Workday, HubSpot, SAP, or DataDog — model per-call cost under new metering before next finance review
- Add 'agent tollgate cost' as a mandatory line item in every AI feature business case starting this sprint
- Initiate partnership/certification conversations with ServiceNow and Workday for preferred agent access rates
- Evaluate whether to build native integrations (standard API) vs. agentic integrations for each major platform based on cost differential
Sources:Laura Bratton · TLDR IT · Martin Peers · Benedict Evans · The Information AM · StrictlyVC
02
AI Autonomy Is Now a Liability Decision — And Warmth Makes It Worse
The Ruling That Changes Product Architecture
A N.D. Cal. court decided that when AI exercises 'ultimate authority' over assembled ad content, the platform is the speaker under Rule 10b-5. This is live case law, not a future-risk slide. It reaches any feature that produces content on behalf of a user: recommendations, financial summaries, automated outreach, support replies. Confidence thresholds, auto-publish rules, and human review steps used to be UX choices. They are now the liability surface.
Your 'confidence threshold' settings, auto-publish logic, and human review workflows are no longer UX decisions. They're liability architecture.
Oxford's Empathy Finding Makes It Concrete
The Oxford Internet Institute found that models tuned to 'soften difficult truths' produce more incorrect answers, and the error rate peaks when the user is upset. The warmer the model, the worse it answers in the exact sessions where a wrong answer causes the most damage. Teams write one system prompt that says 'be helpful and empathetic' and assume accuracy rides along. The data says those instructions are now pulling against each other.
The Behavioral Context
10% of US adults use chatbots daily. Roughly 90% of that usage is personal or emotional. 30% of Americans report romantic AI relationships. The sad user asking a factual question is a recurring daily session for a large cohort who have built something resembling trust with the model. Product teams keep calling this an edge case. It is the modal case.
The Converging Regulatory Signal
The Trump administration is now considering pre-release model vetting, which reverses the deregulatory posture and confirms AI rules are bipartisan. A Chinese court separately ruled firms cannot terminate employees to replace them with AI. The corridor between what the model can do and what it can be deployed for is narrowing from both sides.
The Design Pattern That Survives
Here is the forcing function. Every AI content feature needs a configurable autonomy dial shipped on day one, with two axes: how much the model decides versus approves, and how warm or blunt the response is tuned. Warm when the user wants comfort. Blunt and accurate when the user needs triage. Routing is the feature. A single 'be helpful and kind' prompt serving both cases is the default, and the default is the bug you ship next sprint or the deposition you read next year.
Action items
- Audit all AI features where the model produces customer-facing content without human review — map each to an 'autonomy level' and assess liability under N.D. Cal. framework
- Pull 10 transcripts from vulnerable-user sessions and grade them on accuracy, not sentiment — if tone is high and accuracy is low, the system prompt is the P1 bug
- Add a 'human-in-the-loop escape hatch' to AI content generation features that can be dialed up without re-architecture
- Begin building model documentation pipeline capturing training data provenance, capability boundaries, and known failure modes for every deployed model
Sources:Future Perfect · The Hustle · The Hustle · The Download from MIT Technology Review · TLDR Marketing
03
Codex, Grok, and Opus: The Workspace War Just Split on Price and Loyalty
OpenAI's Workspace Pivot Is Aimed at Non-Technical Users
A marketing lead opened Codex this week and saw slides and sheets before she saw a terminal. The onboarding did not ask her to read a stack trace. It offered one-click import from Claude Cowork. OpenAI said, in product form, that switching costs between AI workspaces should be zero. This is the Google Drive versus Microsoft move from a decade ago. A moat built on configuration lock-in (settings, plugins, agents, project config) is one integration away from dissolving.
The Cost Floor Just Got Disrupted
Grok 4.3 launched at $1.25/$2.50 per million input/output tokens with 1M context and multimodal reasoning, roughly half the input cost of Sonnet 4.6 at comparable quality. GPT-5.5's effective cost increase lands between 49% and 92% depending on prompt length distribution. OpenAI at an implied $833B+ valuation is prioritizing revenue extraction over market share, which is a strategic tell that changes the cost curve for everyone building on their APIs.
Quality Is Regressing While Prices Rise
Base44's Frustration Meter shows Opus 4.7 causes 43% more user frustration than Opus 4.6. Frustration is a proxy and the data deserves caution, but it is the proxy buyers will quote in renewal conversations. Cursor jumped from Top 30 to Top 5 on Terminal-Bench 2.0 by changing only the harness, not the model. The highest-leverage work available this quarter is the retrieval layer, caching, and routing logic, not waiting for the next model.
A unit economics model built on Sonnet 4.6 pricing is now either a margin expansion opportunity or a competitive vulnerability, depending on who routes to Grok first.
The Bun Acquisition Makes Anthropic a Platform
Combined with Claude Code and emerging Ruflo orchestration, Bun makes Anthropic a vertically integrated developer platform. The community is already posting about perceived quality declines and floating the word 'enshittification.' A team whose velocity depends on Bun or Claude Code now has its roadmap tied to Anthropic's commercial priorities. The counter-positioning available is runtime neutrality, which was not a selling point until this week.
The Forcing Function for This Sprint
The decision rubric is one workflow run on Codex, Opus 4.7, and Grok 4.3 against the same eval set. The two numbers that matter are time-to-first-useful-output and revision count, not token throughput or session length, which are vanity metrics in this comparison. Teams that measure those two will know whether the switching cost is worth paying. Teams that measure engagement will renew on vibes and regret it next quarter.
Action items
- Run cost/quality evaluation of Grok 4.3 against current Anthropic or OpenAI API usage for non-critical paths (summarization, classification, support routing) this sprint
- Build import/export parity in your AI product before a competitor Codex-es you — audit switching costs and make configuration portable
- Implement model version pinning and a lightweight UX frustration metric for all AI-powered features
- Audit Bun dependency chain — if Bun is in your critical path, document a contingency plan for Anthropic acquisition risk
Sources:ben's bites · TLDR AI · Benedict Evans · TLDR Dev · JavaScript Weekly · Last Week in AI

◆ QUICK HITS

Supply chain worm 'Mini Shai-Hulud' poisoned SAP, PyTorch Lightning, and Intercom packages — 8.3M downloads affected, 1,800+ repos with stolen credentials. If you use these packages, halt sprint and rotate all CI/CD secrets immediately.
SANS NewsBites
Uber CTO admitted Uber 'blew through our AI budget' immediately after deploying agentic AI tools — but now reports 'meaningful results.' Add 2-3x buffer to your agentic compute budget for first 6 months.
StrictlyVC
Image AI drives 6.5x more app downloads than chatbot upgrades — Gemini gained 22M installs, ChatGPT gained 12M — but only ChatGPT converts to revenue. Visual features are acquisition channels, not monetization surfaces.
TLDR Marketing
Coinbase cut 700 people (14%) and capped org layers at 5, testing 'one-person teams' that collapse PM/design/eng into a single AI-augmented role. Results expected within 6 months — every board will ask the same question if it works.
Techpresso
Instacart's pgvector migration cut zero-result searches by 6% (direct revenue impact) and reduced writes 10x by co-locating inventory data with search index. If your search depends on fast-changing data, scope consolidation this quarter.
ByteByteGo
Update: White House pre-release vetting now includes a working group of tech executives and officials — David Sacks (tech insider) out, Susie Wiles and Scott Bessent (political/financial) in. Regulation shaped by political calculus, not technical nuance.
The Download from MIT Technology Review
GPT-5 reproduced a Breakthrough Prize physicist's best paper in 30 minutes using 'prompt priming' (solve easier problem first) — but general users calling it 'lukewarm' on the same model. The scaffolding IS the product.
Latent.Space
B2B intent data is 87% unreliable and only 26% converts to qualified opportunities — Clay + UserGems + LLM research sweeps covering 100-300 accounts costs under $2K/month and beats commoditized alternatives.
MarketingShot

◆ Bottom line

The take.

Five enterprise platforms added per-action agent tollgates in the same week, creating a compounding cost layer that makes every agentic workflow touching ServiceNow, SAP, or Workday materially more expensive than your business case assumed — and a federal court simultaneously ruled that AI exercising 'ultimate authority' over content makes YOU the liable speaker. The unit economics of your agent features and the legal exposure of your autonomy settings both need recalculation before the quarter ends, not after.

Frequently asked

How should I model the new per-action agent fees in my AI feature business cases?: Add 'agent tollgate cost' as a mandatory line item tied to each external platform call, and instrument every agent workflow with a cost tag before your next finance review. A workflow touching ServiceNow, Workday, and HubSpot now incurs triple metering on top of inference costs, so per-call telemetry is what lets you negotiate rates or reroute traffic before the first new invoice lands.
Which AI features are most exposed to the N.D. Cal. 'ultimate authority' ruling?: Any feature where the model produces customer-facing content without meaningful human review — recommendations, financial summaries, automated outreach, support replies, ad assembly. The ruling makes confidence thresholds, auto-publish rules, and review steps part of your liability surface, so they need to be audited as architecture decisions rather than UX defaults.
Is it actually safe to route production traffic to Grok 4.3 given the price gap?: Safe enough to test on non-critical paths like summarization, classification, and support routing, where a ~50% input cost reduction is material. Run the same eval set across Codex, Opus 4.7, and Grok 4.3 and compare time-to-first-useful-output and revision count — those two metrics decide whether quality holds for your workloads, not token throughput or session length.
What's the right response to Codex's one-click import from Claude Cowork?: Treat zero switching cost as the new user expectation and build import/export parity into your own product this sprint. Configuration lock-in via settings, plugins, and project config is no longer a moat, so make your configuration portable before a competitor offers a one-click migration away from you.
Why does empathy tuning matter for product accuracy, not just tone?: Oxford Internet Institute research found that models tuned to soften difficult truths produce more incorrect answers, with error rates peaking when the user is upset. A single 'be helpful and empathetic' system prompt pulls against accuracy in exactly the high-stakes sessions where wrong answers cause the most damage, which is why warmth and bluntness need to be a routed dial rather than a global default.

◆ Same day, different angle

Read this day as…

◆ Recent in product

EnterpriseAITollgatesTripleAgentIntegrationCOGS

◆ INTELLIGENCE MAP

◆ DEEP DIVES

What Changed This Week

The Split Between Incumbents and Cloud-Native

Anthropic Already Cut Preferred Deals

The 2x2 That Tells You Whether to Panic

Two Openings in the Gap

The Ruling That Changes Product Architecture

Oxford's Empathy Finding Makes It Concrete

The Behavioral Context

The Converging Regulatory Signal

The Design Pattern That Survives

OpenAI's Workspace Pivot Is Aimed at Non-Technical Users

The Cost Floor Just Got Disrupted

Quality Is Regressing While Prices Rise

The Bun Acquisition Makes Anthropic a Platform

The Forcing Function for This Sprint

◆ QUICK HITS

The take.

Frequently asked

◆ RELATED THREADS