Bosun
All posts

Building an npm CVE patching task for Bosun

Dogfooding Bosun for npm advisory response: audit, triage, grouping, patching, verification, and reviewer evidence as CVEs land.

 at ·by Timon Vonk

CVE-2026-45321 covers 84 malicious versions across 42 @tanstack/* packages published to npm in a six-minute window. StepSecurity’s Mini Shai-Hulud write-up walks through a self-spreading npm supply-chain attack. Bitwarden confirmed a malicious @bitwarden/cli@2026.4.0 npm distribution window tied to CVE-2026-42994.

npm security has been on fire lately. I mostly use Node for frontend, but even then, across all our projects, I still spend an hour a week. I’ve heard more worrying stories from friends on backend (and especially legacy). Suddenly patch time becomes a target and roadmap time goes through the window. Anyway, for myself, frontend or not, I still want to remediate as soon as possible. It’s a structured, repeatable chunk of work, and that’s exactly what Bosun is for.

Spend all my free time fixing vulnerabilities

Often, the fix is a single version pin / lockfile update and we’re good. But the time is really spend on:

  • Switching context
  • Reading and understanding the advisory
  • Verify if we could have been attacked (might involve more than just code)
  • Check if the issue needs regression tests
  • Check if the update needs code changes
  • Check that the path is covered by tests
  • Also group CVEs per package; avoid lock file hell
  • Check if package has been released > 24 hours
  • Minimal update if possible
  • Double check no transitive/exotic dependencies
  • Double check no weird new build scripts
  • If needed, change code
  • Pull request with reasoning and results
  • … or an issue with note on fixing it later

That interruption is really expensive. A CVE alert is rarely just a one-shot package bump. Someone has to work out whether the repo is affected, whether the dependency is direct or transitive, whether the fix is mature, whether the lock file diff is sane, whether install scripts changed, whether tests still pass, and what to tell the reviewer.

Nobody has a spare afternoon for every advisory that crosses the desk. I prefer to wake up with solutions, not problems.

So I figured, why not automate it with Bosun itself? I can add all the structure I need and be sure that path was actually followed. The tasks should run on a short interval, so new advisories are audited and triaged as soon as package-manager advisory data surfaces them. If a fix is available and passes policy, Bosun can patch it and open a reviewable PR without waiting for someone to notice the alert, and always create the issue. And, we can make it directly available to anyone using Bosun as well.

In the past couple of days I’ve already merged 4 PRs. Nice, my time is down to minutes per week, and patch time went to minutes :tada:.

The automation breaks it down into two tasks:

  • A dispatcher task audits the repo on a schedule, creates or updates tracking issues, groups related advisories, and dispatches remediation only when a fix is eligible.
  • A remediation task patches one advisory group, verifies the result, comments the linked issues, and opens the pull request.

Dispatcher task

Bosun CVE audit dispatcher task canvas

The dispatcher finds package roots, audits them, normalizes advisories, applies policy, writes issues, groups related CVEs, and dispatches one remediation task per ready group.

The dispatcher starts with repository discovery. It finds package.json files, lockfiles, workspace config, and packageManager metadata. Each package root becomes a separate audit target because package-manager behavior is local to the root. A monorepo can have one npm root, several pnpm workspaces, stale lockfiles, or mixed history from migrations.

The audit stage runs the configured package manager for each target. The output is treated as data. Lifecycle scripts are disabled where the package manager supports it, because audit collection should not execute package behavior. The task asks for machine-readable output and keeps raw output as an attachment or evidence field for later review.

The normalization stage converts package-manager output into one advisory record:

advisory:
id: CVE-2026-0000
advisory_urls:
- https://nvd.nist.gov/vuln/detail/CVE-2026-0000
package_name: "@acme/session-cache"
package_root: apps/dashboard
package_manager: pnpm
lockfile: apps/dashboard/pnpm-lock.yaml
dependency_scope: runtime
dependency_path:
- dashboard
- "@acme/auth-ui"
- "@acme/session-cache"
vulnerable_range: "<4.3.2"
fixed_versions:
- 4.3.2
severity: high
summary: Short plain-language explanation for the issue and reviewer.

That record gives later agents a contract. The remediation task should not need to know whether the original data came from npm, pnpm, Yarn, or Bun.

Triage fills the fields audit output cannot prove by itself. The task verifies advisory links, checks whether the affected range matches the installed version, reads registry publish times, identifies fixed versions, and applies the release-age policy. The default grace period is 24 hours. If every fixing version is younger than the grace period, the dispatcher creates or updates the issue and records the earliest time the group becomes eligible.

If needed, we can also give our triage agent access to logs, alerts, or other relevant context.

Grouping happens before dispatch. The group key is:

package_root + lockfile + package_manager + package_name

If one package has three CVEs in the same root, create one remediation group and one pull request, so we avoid lock file conflicts between multiple pull requests. We do create one issue per CVE, so that we are aware without distraction.

For each group, the dispatch payload looks like this:

advisory_group:
group_key: apps/dashboard|pnpm-lock.yaml|pnpm|@acme/session-cache
package_name: "@acme/session-cache"
package_root: apps/dashboard
package_manager: pnpm
lockfile: apps/dashboard/pnpm-lock.yaml
selected_version: 4.3.2
selected_version_published_at: 2026-05-16T10:30:00Z
release_age_policy:
grace_period_hours: 24
eligible: true
advisories:
- id: CVE-2026-0000
advisory_urls:
- https://nvd.nist.gov/vuln/detail/CVE-2026-0000
linked_issues:
- 123
safety_flags:
exotic_dependency_source: false
new_build_script_approval_required: false
required_checks:
- audit
- package-manager-install
- configured-tests

Only eligible groups are dispatched. Every run should refresh the advisory state, but only ready work should start a coding agent.

Issue output

The issue is where we park the investigation state before a patch exists. It has to be useful even if the remediation task never runs.

Currently that format looks like this:

## Advisory
- CVE: CVE-2026-0000
- Source: https://nvd.nist.gov/vuln/detail/CVE-2026-0000
- Package: @acme/session-cache
- Affected range: <4.3.2
- Fixed version considered: 4.3.2
## What it means
Short explanation in plain language: this package stores session data used by the dashboard login flow; the advisory allows cache poisoning through a crafted key; this repository is affected because the package is loaded at runtime through @acme/auth-ui.
## Repository impact
- Package root: apps/dashboard
- Package manager: pnpm
- Dependency path: dashboard -> @acme/auth-ui -> @acme/session-cache
- Scope: runtime
## Current decision
- Status: waiting for release-age gate
- Grace period: 24 hours
- Earliest automatic patch time: 2026-05-17 10:30 UTC
- Human override: reasonable if the package is exposed in production or the advisory is being actively exploited.
## Proof plan
- Re-run audit after patch
- Run configured tests
- Inspect lockfile diff for unrelated graph churn
- Check package-manager safety flags

That gives us enough context to override, wait, or review the later PR without reconstructing the advisory from scratch.

Remediation task

Bosun CVE remediation task canvas

The remediation task receives one advisory group, revalidates the facts, patches the package, verifies the result, comments linked issues, and opens the PR.

The first step is revalidation. The remediation task treats dispatcher output as evidence, then checks the current repository and registry state again. The selected version must still exist, must still satisfy every advisory in the group, and must still pass the release-age policy. If the lockfile changed since dispatch, the task re-evaluates the group instead of applying a stale command. Each task runs with it’s own checkout, inside a hardened MicroVM.

Command selection follows package-manager policy. The task chooses from a small command set:

  • direct dependency: update the manifest and lockfile with the package manager
  • transitive dependency: prefer a minimal lockfile update when supported
  • unavailable transitive fix: add or update overrides or resolutions when the package manager supports it
  • semver-major or unclear graph churn: stop and request review through the issue

Install and update commands run with lifecycle scripts disabled where possible. Safety signals, such as build-script approvals and exotic dependency sources, are inspected after the change. The task should not approve a new build script automatically. If the patch requires one, the PR calls it out as a reviewer decision.

We want to make the minimum change to resolve the advisory:

  • the package manifest for the affected root
  • the matching lockfile
  • an override or resolution when needed
  • a targeted test only when the vulnerability requires behavior coverage

Verification

Verification is a separate stage with read-only intent.

The verifier confirms:

  • the advisory URLs are present in the issue and PR
  • the installed version resolves every advisory in the group
  • the audit command no longer reports the grouped advisories
  • configured tests pass, or failures are copied into the PR with scope
  • the lockfile diff only moves the expected package graph
  • no new git, file, http, or directory dependency source appeared (aka exotic dependencies) without being flagged
  • no new build-script approval was added without being flagged
  • the selected version satisfies the release-age policy, or the PR says why a human override was used

This stage makes sure I can hit merge on the pull request when it lands. Since no human was involved until review, I’m more distrustful of what is produced and concluded. Having an audit trail, clear commits, and clear explanation helps me regain that trust.

PR output

The PR body is part of the remediation. We want the reviewer to see what changed, why that version was chosen, and how the task proved the advisory group is gone.

## Summary
Patches @acme/session-cache in apps/dashboard from 4.3.1 to 4.3.2.
## Advisories
- CVE-2026-0000: https://nvd.nist.gov/vuln/detail/CVE-2026-0000
- GHSA-xxxx-yyyy-zzzz: https://github.com/advisories/GHSA-xxxx-yyyy-zzzz
## Why this version
- 4.3.2 is the earliest version that satisfies every grouped advisory.
- Published at 2026-05-16 10:30 UTC.
- Release-age policy: 24 hours, satisfied.
## Repository impact
- Package root: apps/dashboard
- Package manager: pnpm
- Dependency path: dashboard -> @acme/auth-ui -> @acme/session-cache
- Scope: runtime
## Changed files
- apps/dashboard/package.json
- apps/dashboard/pnpm-lock.yaml
## Verification
- pnpm audit --json: grouped advisories no longer reported
- pnpm install --ignore-scripts --lockfile-only: passed
- pnpm test: passed
## Safety notes
- No new build-script approval detected.
- No exotic dependency source detected.
- Lockfile diff limited to @acme/session-cache and its required subgraph.
## Linked issues
Closes #123.

The issue comment gets a shorter version of the same evidence and links back to the PR.

What we are learning by dogfooding it

In dogfooding this, I had the opportunity to deep dive into what intricacies are needed to remediate fast, especially when written by an agent, and not by trusted engineers. That means we need more proof, a low barrier to review, and no distractions so I can merge fast.

Our recent dispatch feature (trigger tasks dynamically from tasks) has been helpful here. I can imagine in large projects, we can also spin up remediation in sub projects and link it all up.

Also, recent Shai-Hulud (<3 Dune) has shown the crazy importance of untrusted code isolation. Moving our firecracker decision from ‘this is cool and fast’ to ‘wouldn’t dream running without’.

Because we can break up complex, repeatable work into smaller, digestible steps, hybrid agent / deterministic solutions can be applied in so many more places than I anticipated, with deterministic results.

Next

We are going to keep running this against our own npm projects, see what else we need, and use the same workflow as templates for all our other toolchains.

If you want to compare notes on CVE response or try the task on your own repos, reach out directly, or join us on Discord. Grill us relentlessly with strong opinions about dependency security, because those opinions are exactly what this task should encode.