Engineer daily

Edition 2026-05-07 · read as Engineer

Slopsquatting:NorthKoreanAPTsExploitLLMHallucinations

Sources
35
Words
1,126
Read
6min

Topics Agentic AI LLM Inference Data Infrastructure

◆ The signal

North Korean APTs are registering package names that LLMs hallucinate — turning your AI coding assistant into an unwitting supply-chain compromise vector called 'slopsquatting.' The hallucinations are reproducible across users and sessions, making squatting a reliable yield. Your CI pipeline needs a dependency allowlist that rejects any package not already in your lockfile without explicit human approval — today, not next sprint.

◆ INTELLIGENCE MAP

  1. 01

    AI-Assisted Supply Chain Attacks: Slopsquatting + Certificate Compromise

    act now

    Two new supply chain vectors converged this week. Slopsquatting exploits reproducible LLM hallucinations of package names — adversaries register them on npm/PyPI and wait for the next agent install. DAEMON Tools compromise shows a signed installer shipping malware for 28 days via legitimate certificate. Both bypass existing controls.

    28
    days undetected (DAEMON)
    5
    sources
    • Attack surface
    • DAEMON Tools C2
    • MCP CVEs filed
    • Affected downloads
    1. Apr 8DAEMON Tools signed malware begins shipping
    2. Apr 28cPanel zero-day exploitation begins
    3. May 5DAEMON compromise detected (28 days)
    4. May 7Slopsquatting attack class disclosed
    5. Jun 30cPanel 64-day window closes
  2. 02

    Active Exploitation: PAN-OS Zero-Day + cPanel Mass Compromise

    act now

    CVE-2026-0300 in PAN-OS is being actively exploited against User-ID Authentication Portals with no patch until mid-May. cPanel CVE-2026-41940 already hit 44,000+ hosts with ChaCha20 ransomware after a 64-day zero-day window. CISA is floating compression of patching deadlines from 14 days to 3.

    44,000+
    cPanel hosts compromised
    4
    sources
    • PAN-OS patch ETA
    • cPanel zero-day window
    • Proposed patch SLA
    • Ransomware variant
    1. cPanel hosts hit44000
    2. Hosts w/ ransomware7135
    3. Hosts attacking others15448
  3. 03

    Multi-Token Prediction Ships Across Inference Stack

    monitor

    Gemma 4's 78M-parameter MTP drafter delivers 2-3x decoding speedup with day-0 support in vLLM, SGLang, Ollama, MLX, and AI Edge. llama.cpp PR #22673 shows 75% acceptance rate and >2x throughput on consumer hardware. This is a free inference upgrade for self-hosted models — no quality degradation.

    3x
    decoding speedup
    3
    sources
    • Drafter size
    • Acceptance rate
    • Size ratio
    • Frameworks supported
    1. Gemma 4 MTP claim3
    2. llama.cpp measured2.2
    3. Theoretical max3.25
  4. 04

    Vision Agents 45x Cost Gap + 30% Hallucination Floor Persists

    monitor

    Vision-based agents cost 45x more per task than structured API agents — a structural gap that better models won't close. Meanwhile, independent research puts frontier models (Opus-4.5 with web search) at ~30% hallucination in multi-turn conversations. GPT-5.5 Instant benchmarks improved but the multi-turn floor hasn't moved.

    45x
    vision vs API cost gap
    4
    sources
    • Multi-turn halluc. rate
    • GPT-5.5 AIME gain
    • Per-step reliability
    • 3-step end-to-end
    1. API agent cost/task1
    2. Vision agent cost/task45
  5. 05

    Agent Architecture Patterns Maturing: Skills, Isolation, Live Navigation

    background

    Three patterns converging: SKILL.md files with LLM-native routing replace plugin registries. Onyx's strict agent isolation prevents context contamination across RAG stages. Scout replaces embedding pipelines with live API navigation at query time. SubQ claims 12M-token linear context but remains unverified.

    80%
    token reduction (pre-index)
    4
    sources
    • SubQ context claim
    • Tool call reduction
    • Token savings
    • Independent verification
    1. 01Pre-indexing token savings80
    2. 02Tool call reduction40
    3. 03SubQ vs FlashAttention52

◆ DEEP DIVES

  1. 01

    Slopsquatting: Your AI Coding Agent Is Now a Supply Chain Attack Vector

    The Mechanism Is Boring, Which Is Why It Works

    An LLM suggests import requests-auth-helper; the package does not exist yet, but a week later someone publishes it on npm or PyPI after watching suggestion logs or enumerating plausible names. The next agent run installs it, the payload executes at install time, and the developer never typed the name.

    The attacker does not need to guess what you will mistype. The model guesses for them.

    Call it hallucination-squatting at scale: one model, used by millions of developers, emits the same plausible-but-nonexistent name often enough that squatting it is a reliable yield. North Korean APT groups noticed before security teams did.

    Why Existing Controls Fail

    The failure mode compounds with MCP STDIO vulnerabilities disclosed this week: 150M+ affected downloads across 30+ disclosures and 10+ CVEs from one root cause in the Model Context Protocol transport layer. The framing layer parses untrusted bytes before any model code runs, so a hostile tool response reaches a local shell before anyone reads a prompt.

    The DAEMON Tools incident shows legitimate code signing certificates can be compromised. C2 ran over QUIC for 28 days undetected because most network monitoring treats QUIC as "UDP/443, probably Chrome," so the install looked signed and trusted while the payload executed anyway.

    The Fix Is Three Controls, None Default

    1. Lockfile verification in CI: reject any package not already in your manifest without explicit human approval, and gate AI-generated dependency additions specifically.
    2. Registry allowlist at the network layer: for fully agentic workflows (Devin-style), sandbox execution with registry allowlisting so the agent cannot reach arbitrary packages.
    3. Install-time sandboxing: so a malicious setup.py cannot read ~/.aws/credentials on first run. Socket.dev flags newly-published packages, and lockfile-lint enforces pre-approved manifests.

    Cross-Source Pattern

    Five independent sources flagged overlapping aspects of this threat this week, and the convergence tells the story: agent autonomy scales faster than agent auditing. The vector varies across slopsquatted packages, MCP STDIO RCE, and certificate-compromised installers, but the pattern holds: install-time code execution moved inside the agent loop, while lockfile gates and network allowlists stayed where they were a year ago.

    Action items

    • Add a pre-commit hook that fails on any new dependency not already in the lockfile — specifically targeting AI-generated additions
    • Disable auto-installation in AI coding agent configs (Copilot, Cursor, Claude Code) — set agents to read-only mode for dependency manifests
    • Deploy CI rule: reject any package whose name was added to lockfile in the last 7 days without a human-authored commit touching the manifest
    • Inventory all MCP-compatible tooling and assess STDIO transport usage; sandbox AI coding agents in containers with restricted network and filesystem access

    Sources:CSO First Look · TLDR InfoSec · Daniel Miessler · The Hacker News · Risky.Biz

  2. 02

    PAN-OS Active Exploitation + Code Signing Trust Collapse — Two Fires, One Week

    PAN-OS CVE-2026-0300: The Firewall Is the Foothold

    Buffer overflow in the management plane, confirmed exploited in the wild, with scanning live before most teams finished standup. No misconfiguration required. Exposure is the bug. If the management interface is reachable from anything not under your own control, that is the fire.

    Patch ETA is mid-to-late May, which is two more weeks of exposure on the late end. The only mitigation is access restriction: VPN, IP allowlist, or offline.

    The perimeter device is only a perimeter while it is not the foothold.

    Post-compromise behavior follows a pattern. Persistence lands on the appliance itself, because rebooting a firewall does not feel like incident response. Check logs for unexpected admin sessions, new config commits, and outbound connections to anything that is not a known update endpoint.

    cPanel: 44,000+ Hosts, 64-Day Window

    CVE-2026-41940 (CVSS 9.8), a CRLF injection auth bypass in cPanel/WHM, was exploited for 64 days before disclosure. The tally: 7,135 hosts with ransomware, 15,448 hosts participating in attacks in a single day, and a Mirai variant deployed for DDoS. "Sorry" ransomware uses ChaCha20 (fast, no AES-NI dependency on older shared hosts) with RSA-2048 key wrapping. Textbook and unbreakable without the key.

    The real failure is inventory. The cPanel box someone spun up for a marketing site three years ago. The staging environment inherited from an acquisition. Without a complete cPanel inventory there is no claim of safety.

    DAEMON Tools: Signing Trust Is Not Safety

    A valid certificate chain and the real distribution source delivered malware for 28 days. Signing proves provenance, not intent. When the loader presents a valid chain to a trusted root, EDR code-integrity checks return true and telemetry gets downgraded to noise.

    AttackDurationDetection Gap
    PAN-OS CVE-2026-0300Ongoing (no patch)Management plane exposure
    cPanel CVE-2026-4194064 days pre-disclosureNo tenant-level visibility
    DAEMON Tools28 daysValid signature bypassed EDR

    CISA's Response: 3-Day Patching

    CISA is floating compression of the remediation window from 14 days to 3. Three days is not a patch window. It is a release pipeline requirement. If production deploy currently takes a week, the math does not work. Release engineering needs emergency paths that are actually tested, not runbook entries that have never been exercised.

    Action items

    • Verify PAN-OS version across all Palo Alto devices and restrict management plane access to known IPs within 24 hours
    • Inventory all cPanel/WHM instances including shadow IT and acquisitions; patch or isolate by end of week
    • Audit code signing trust: pin allowlists to certificate thumbprints, not subject strings; instrument QUIC egress visibility
    • Measure your patch-to-production pipeline latency — can you ship a critical fix in under 72 hours including validation?

    Sources:CSO First Look · Daniel Miessler · The Hacker News · Risky.Biz · Techpresso

  3. 03

    Multi-Token Prediction Ships Production-Ready — The Free 2-3x Inference Speedup

    What Actually Shipped

    Gemma 4's 78M-parameter MTP drafter landed in vLLM, SGLang, MLX, Ollama, and AI Edge on day zero. llama.cpp PR #22673 reports >2x throughput at ~75% acceptance with 3 draft tokens against Qwen3.6 27B. The mechanism is not new: a small model proposes k tokens, the big model verifies them in one forward pass, accepted prefixes advance the decode pointer.

    The Arithmetic

    1 + (3 × 0.75) = 3.25 tokens per verification cycle. At a 350:1 size ratio the drafter cost rounds to zero. The ceiling is acceptance rate on your traffic, not the benchmark's.

    This is a latency-SLA technique, not a throughput technique. If GPUs are already saturated, the drafter competes with your own requests for the same KV cache.

    What to Benchmark, In Order

    1. Acceptance rate on the actual prompt distribution. Code traffic and chat traffic do not share an acceptance curve.
    2. End-to-end latency at real batch sizes. Speculative decoding behaves differently under contention because verification competes for KV cache.
    3. Tail latency (p99). Mean improves while p99 degrades when rejections cluster.
    4. Throughput per GPU. The number finance cares about, quoted least often.

    Cross-Stack Context

    Shipping alongside this: a 60x cold-start reduction that serves weights from GPUs already holding them rather than pulling from cloud storage, and DeepMind's Decoupled DiLoCo at 88% goodput versus 27% standard with 240x less inter-datacenter bandwidth. The inference optimization surface is wide open this quarter.

    The honest version. 3x is a ceiling measured on favorable traffic with a warm cache. Production numbers will be lower. Deploy behind a flag you can roll back. At 78M parameters of overhead and no measured quality loss, the risk-reward is asymmetric.

    ProgramBench Reality Check

    Inference gets faster. ProgramBench still scores 0% on whole-repo generation (SQLite, FFmpeg, PHP compiler from spec). Models pass >50% of individual tests but cannot hold system coherence across hundreds of interacting components. The implication for agent architecture is to decompose into independently verifiable units. Single-shot whole-system generation is not a thing yet.

    Action items

    • Pull llama.cpp PR #22673 and benchmark MTP against your Qwen3/Gemma 4 models on actual workloads — measure acceptance rate, not just throughput
    • If running vLLM or SGLang in production, enable Gemma 4 MTP drafter on a canary deployment and measure tokens/sec delta against current config
    • Design agent architectures around verifiable sub-task decomposition — do not build whole-repo generation pipelines that assume single-shot correctness

    Sources:TLDR Dev · AINews · TLDR AI

◆ QUICK HITS

  • 25.7% of Stripe webhook endpoints skip signature verification — grep your payment handlers for `constructEvent` and fix the missing call today

    TLDR InfoSec

  • Windows Server 2025 dMSA Ouroboros: 6-command AD persistence technique survives password rotation and account deletion — Microsoft declined to patch, detection is the only lever

    TLDR InfoSec

  • Airbnb isolates monitoring from the service mesh to prevent circular dependency failure — the fix is a dead man's switch on a completely independent channel

    TLDR Dev

  • Databricks built Pantheon (custom TSDB at 10T samples/day) and Hydra (50x cheaper high-cardinality path via Lakehouse) — steal the tiered pattern: alert on aggregates in TSDB, dump raw data to Parquet

    TLDR Dev

  • Bishop Fox released AIMap for discovering exposed AI agent infrastructure using Nuclei templates — run it against your external attack surface before someone else does

    Daniel Miessler

  • SubQ claims 12M token context at 52x faster than FlashAttention — unverified, no third-party evals. Do not redesign RAG pipelines on a press release

    Unwind AI

  • Update: Anthropic $200B Google Cloud commitment — treat Claude and GCP as correlated failure domains; a Google region event is now an Anthropic event

    The Algorithmic Bridge

  • Meta shipped MCP server with 29 tools for ad management — strongest signal yet that MCP is settling as the SaaS integration layer, not a Claude-desktop curiosity

    TLDR Marketing

  • Ollama 'Bleeding Llama' vulnerability leaks process memory unauthenticated — patch immediately and put auth in front of every instance regardless of network trust

    Risky.Biz

  • Dynamic pricing bans active in Maryland (Oct 1, 2026) with ~33 states drafting similar legislation — if your pricing service takes user signals as inputs, add jurisdiction-aware feature flags now

    The Hustle

◆ Bottom line

The take.

Your AI coding assistant is now a supply chain attack vector — North Korean APTs are registering the package names LLMs hallucinate, and your CI has no gate to catch it. Add a lockfile verification step today, patch PAN-OS tonight (it's being exploited with no fix until mid-May), and grab the free 2-3x inference speedup from multi-token prediction that shipped across vLLM, SGLang, and llama.cpp this week with zero quality tradeoff.

— Promit, reading as Engineer ·

Frequently asked

What is slopsquatting and why is it different from typosquatting?
Slopsquatting is when attackers register package names that LLMs reproducibly hallucinate, then wait for AI coding agents to auto-install them. Unlike typosquatting, the developer never makes a typo — the model invents the name on their behalf, and because hallucinations are reproducible across users and sessions, squatting a single fake name yields hits across millions of agent runs.
How do I block AI-introduced malicious dependencies in CI today?
Add a pre-commit or CI gate that rejects any package not already present in the lockfile unless a human-authored commit explicitly modified the manifest. Pair this with disabling auto-install in Copilot, Cursor, and Claude Code, and add a rule that flags any dependency added in the last 7 days without manual review. These are same-day fixes that close the auto-install path slopsquatting depends on.
If PAN-OS has no patch yet, what mitigation actually works?
Restrict management plane access to a tight IP allowlist, VPN, or take it offline entirely — exposure of the management interface is the vulnerability. Patches are not expected until mid-to-late May, so access restriction is the only control. Also audit appliances for unexpected admin sessions, new config commits, and outbound connections to non-update endpoints, since persistence often lands on the firewall itself.
Is the Gemma 4 MTP speedup safe to enable in production?
It is safe to evaluate behind a feature flag, but do not assume the 2-3x number will hold on your traffic. Acceptance rate depends on prompt distribution, and speculative decoding can degrade p99 latency under contention because the drafter competes for KV cache. Benchmark acceptance rate, end-to-end latency at real batch sizes, tail latency, and throughput per GPU before rolling out.
Why does code signing fail to stop attacks like the DAEMON Tools incident?
Code signing proves provenance, not intent — a valid certificate chain only confirms who signed the binary, not whether that signer is still trustworthy. When a legitimate publisher's certificate is compromised, EDR code-integrity checks pass and telemetry gets downgraded as benign. Pin allowlists to certificate thumbprints rather than subject strings, and instrument QUIC egress, since C2 over QUIC ran undetected for 28 days in this case.

◆ Same day, different angle

Read this day as…

◆ Recent in engineer

Keep reading.