pg-boss in production: footguns we hit and how to avoid them
By Michael Cooper · Founder
Four operational gotchas we hit running pg-boss at AGLedger for about a year. One has been fixed upstream. The other three still bite. Reproductions, citations, and the patterns we settled on. Your environment may surface them differently; if you find a cleaner path, we'd like to hear about it.
The short version
We run pg-boss in the low thousands of jobs per minute across federation outbound delivery, webhook fanout, and scheduled maintenance. Four operational footguns hit us repeatedly. This post is one bug per section: the shape, a self-contained reproduction, the upstream issue, and what we did about it.
Tested 2026-05-03 against pg-boss 12.18.2, PostgreSQL 17, Node 24 LTS. If you are on pg-boss earlier than 11.0.8 or 10.4.0, upgrade first: issue #535 is fixed. The footguns below are still live in the latest releases.
The four footguns
1. singletonKey without singletonSeconds is a no-op on standard queues
2. boss.send() returns null on dedup, silently
3. boss.schedule() upserts on (name, key); schedules silently overwrite
4. Schema drift across pg-boss majors (snake_case rename in v10, archive removal in v11)
1. singletonKey without singletonSeconds is a no-op on standard queues
pg-boss 12.x · default queue policy · silent failure mode
Calling boss.send(queue, data, { singletonKey: 'x' }) against a queue using the default standard policy and without singletonSeconds does not deduplicate. The key is treated as a label only. Parallel enqueues all create rows.
The pg-boss jobs API docs are explicit: singletonSeconds is the dedup window. The error mode is silent: callers see a returned id, no exception, and ship. Reproduction:
import PgBoss from 'pg-boss'
const boss = new PgBoss({ connectionString: process.env.DATABASE_URL! })
await boss.start()
try {
await boss.createQueue('demo') // idempotent in v10+
const ids = await Promise.all(
Array.from({ length: 6 }, () =>
boss.send('demo', { x: 1 }, { singletonKey: 'k' }),
),
)
console.log(ids.filter(Boolean).length) // -> 6, not 1
} finally {
await boss.stop()
}A second path: if you want pure key-based uniqueness without a time window, that is what the singleton, stately, or exclusive queue policies are for. Set the policy at createQueue time and singletonKey will enforce uniqueness via partial indexes (visible in pg-boss plans.js). The footgun is the default standard policy specifically.
References: pg-boss issues #81 (historical singleton context, closed 2018) and #548 (open as of 2026-05-03, discussing replace-vs-discard semantics).
What we did. Wrap boss.send behind a small facade that takes key and ttlSeconds as required positional fields. The facade fills both singletonKey and singletonSeconds atomically, so no path through the facade produces the no-op behavior. A small ESLint rule that flags boss.send(...) calls with singletonKey but no singletonSeconds is also feasible against the AST.
2. boss.send() returns null on dedup, silently
pg-boss 12.x · documented, easy to discard
When pg-boss successfully dedups an enqueue, boss.send() returns null, not a job id, and not an error. Callers that destructure or chain on the return without checking const id = await boss.send(...); track(id) silently drop the dedup signal.
import PgBoss from 'pg-boss'
const boss = new PgBoss({ connectionString: process.env.DATABASE_URL! })
await boss.start()
try {
await boss.createQueue('demo')
const a = await boss.send('demo', { x: 1 },
{ singletonKey: 'k', singletonSeconds: 60 })
console.log(a) // -> '01H...' (uuid)
const b = await boss.send('demo', { x: 1 },
{ singletonKey: 'k', singletonSeconds: 60 })
console.log(b) // -> null
} finally {
await boss.stop()
}This is documented behavior, not a bug. We landed an ESLint rule that flags discarded return values from queue-send calls after multiple callsites in our own codebase silently discarded the null. One of them was a metric counting enqueues. Until we caught it, the count was off by the dedup rate.
Reference: pg-boss issue #548 discusses replace-vs-discard semantics and is open as of 2026-05-03.
What we did. Treat every boss.send / boss.insert return value as required-to-handle. The simplest version is the lint rule. The runtime version is a wrapper that returns a discriminated union ({ deduped: true } | { id: string }) so the type system forces the branch.
3. boss.schedule() upserts on (name, key) — schedules silently overwrite
pg-boss 12.x · documented, easy to misuse
The pgboss.schedule table's primary key is (name, key). The key parameter to boss.schedule() defaults to ''. If you call boss.schedule('maintenance', cron, data) once per task, all calls share the empty default key. Only the last one survives. Every prior schedule row is upserted away.
import PgBoss from 'pg-boss'
const boss = new PgBoss({ connectionString: process.env.DATABASE_URL! })
await boss.start()
try {
await boss.createQueue('maintenance')
await boss.schedule('maintenance', '*/2 * * * *', { task: 'expiry-sweep' })
await boss.schedule('maintenance', '*/5 * * * *', { task: 'audit-checkpoint' })
await boss.schedule('maintenance', '0 * * * *', { task: 'reputation-roll' })
// Inspect with: SELECT name, key, cron, data FROM pgboss.schedule;
//
// Expected: three rows.
// Actual: one row, name='maintenance', key='', cron='0 * * * *',
// data={task:'reputation-roll'}.
} finally {
await boss.stop()
}We shipped this bug and did not notice for a while. The schedule that survived was the one we expected to fire most often, so the system looked healthy from the outside. We caught it when an unrelated recovery sweep started failing and we went looking. The schedules that ran were the ones registered last on each boot. The schedules that did not run were the ones we thought were running.
Reference: pg-boss schedule SQL is in plans.js (search for the schedule function); the upsert on (name, key) is visible in the SQL.
What we did. Always pass an explicit, unique key argument to boss.schedule. Wrap it: a helper that takes (boss, queue, cron, data, taskKey) and sets both key: taskKey and singletonKey: taskKey in one call. Never call boss.schedule directly.
To detect existing collisions in your own database:
SELECT name, key, count(*) FROM pgboss.schedule GROUP BY 1, 2 HAVING count(*) > 1;
If that returns rows, you have collisions.
4. Schema drift across pg-boss majors
pg-boss v10, v11, v12 · silent breakage in raw SQL
pg-boss has had three breaking schema changes in living memory. v10 introduced partitioned tables, queue policies, and the snake_case column rename (singletonKey → singleton_key, etc.). v11 removed the pgboss.archive table entirely and changed retention semantics; completed jobs now live in pgboss.job with state = 'completed' until deleteAfterSeconds elapses. v12 went ESM-only with named exports.
Anything that referenced old column names or the removed archive table in raw SQL (recovery sweeps, ops scripts, monitoring dashboards) broke silently on the matching upgrade. The v11 shape we hit:
import { Pool } from 'pg'
const pool = new Pool({ connectionString: process.env.DATABASE_URL! })
// A "find old completed jobs" query that worked on pg-boss 10.x:
const completed = await pool.query(`
SELECT id, completed_on FROM pgboss.archive
WHERE name = $1 AND completed_on > now() - interval '7 days'
`, ['demo'])
// On pg-boss <= 10.x: returns rows.
// On pg-boss 11.x+: throws 'relation "pgboss.archive" does not exist'
// — but only when this query actually runs, which for a weekly sweep
// can be days after the upgrade if the sweep is in a try/catch that
// logs-and-continues.
await pool.end()What we did. A startup sentinel. On boss.start() completion, run one query against information_schema.columns for pgboss.job and assert every column you reference by string literal in raw SQL is present. Fail boot if a required column is missing. Throwing on boot is strictly cheaper than a silent loop.
const REQUIRED = ['name', 'state', 'singleton_key'] as const
const { rows } = await pool.query<{ column_name: string }>(`
SELECT column_name FROM information_schema.columns
WHERE table_schema = 'pgboss' AND table_name = 'job'
`)
const present = new Set(rows.map(r => r.column_name))
const missing = REQUIRED.filter(c => !present.has(c))
if (missing.length > 0) {
throw new Error(`pgboss.job missing columns: ${missing.join(', ')}`)
}The same sentinel pattern works in your migrations gate if you maintain ops dashboards.
Fixed upstream: issue #535
pg-boss 11.0.8 (2025-10-10) · backport in 10.4.0 (2025-11-19)
We want to flag this one explicitly because the symptoms are subtle and the fix is already shipped. pg-boss issue #535 (closed 2025-10-03): on a stately queue with batch_size > 1, when two jobs sharing a singleton key sat in created and retry states at the same time, the fetch query would grab both and try to activate them. The unique-key constraint on the singleton key fails the activation, throws UniqueKeyViolationError, and the worker stops fetching from then on.
If you are on pg-boss earlier than 11.0.8 (or earlier than 10.4.0 on the v10 line), upgrade. To confirm it is not silently affecting you, look for completed jobs sharing a singleton key with completion timestamps within milliseconds of each other:
-- pg-boss v11+ — completed jobs live in pgboss.job until -- deleteAfterSeconds elapses (the archive table was removed in v11). SELECT singleton_key, COUNT(*), MIN(completed_on), MAX(completed_on) FROM pgboss.job WHERE state = 'completed' AND singleton_key IS NOT NULL AND completed_on > now() - interval '7 days' GROUP BY singleton_key HAVING COUNT(*) > 1 AND MAX(completed_on) - MIN(completed_on) < interval '5 seconds' ORDER BY 2 DESC LIMIT 20; -- pg-boss v10.x — query pgboss.archive instead (or in addition).
Credit goes upstream. The fix landed in 11.0.8 via the maintainer (timgit) and was backported to the v10 line in PR #640 by @nrempel, released as 10.4.0. We just had to upgrade.
Honest about scope
We do not claim these are the only pg-boss footguns. They are the four that bit us repeatedly in production at the throughput AGLedger runs (low thousands of jobs per minute, federation outbound plus webhook delivery plus maintenance schedules). Lower-throughput deployments may never hit them. Higher-throughput deployments probably hit them sooner.
pg-boss is the Postgres-native queue we picked for AGLedger and would pick again. The maintainer is responsive. Every footgun above has a corresponding GitHub issue, several with active discussion. If you operate pg-boss at scale and found a cleaner pattern for any of these, please open an issue on pg-boss or send us a note. We'd rather update this post than pretend our patterns are the last word.
Sources & further reading
- pg-boss on GitHub (MIT, maintained by timgit)
- pg-boss issue #535 — worker stops with singleton key in retry + created and batch > 1; fixed in 11.0.8 / 10.4.0
- pg-boss issue #548 — replace-vs-discard semantics, open as of 2026-05-03
- pg-boss issue #81 — historical singleton context (closed 2018)
- pg-boss release notes — v10, v11, v12 breaking changes
- PostgreSQL
information_schema.columns— reference for the schema sentinel pattern - Crunchy Data: Postgres information_schema, the other system catalog — background on
information_schemavspg_catalogfor the startup-sentinel pattern - pg-boss v10.0.0 release notes — snake_case column rename, queue policies, partitioned tables
- pg-boss v11.0.0 release notes — archive table removed, retention semantics changed
- Brandur Leach: Postgres as a queue — canonical piece on the broader pattern pg-boss implements
- Brandur Leach: Implementing Stripe-like idempotency keys — relevant to the “treat dedup signals as required-to-handle” pattern in Footgun 2
Related
AGLedger is a self-hosted cryptographic notary for automated work; every record is hash-chained and Ed25519-signed. pg-boss is part of how we get there. Learn more.