Lessons from an In-Place PostgreSQL 17 to 18 Upgrade
By Michael Cooper · Founder
We moved a populated production database from PostgreSQL 17.9 to 18.3 in place, on Amazon Aurora Serverless v2, and wrote down what bit us and what did not. The headline is a trap that has nothing to do with us and everything to do with how Postgres binds column defaults: after the upgrade, every one of our uuidv7() columns was still calling the old polyfill, and you cannot just drop it. Here are the real numbers.
The numbers up front
| Metric | Value |
|---|---|
| Write-outage during the in-place upgrade | ~10.3 min, self-recovered (no restart) |
| Columns still bound to the polyfill after upgrade | 115 |
| Native vs polyfill uuidv7() - function (1M) | ~396 vs ~1042 ms (2.6×) |
| Native vs polyfill - PK index size (500k) | 15 vs 20 MB (33% bigger) |
| In-place-upgraded DB vs fresh DB - PK insert (500k) | ~2870 vs ~3081 ms (equal) |
| Full audit-vault re-verification after upgrade | 42,007 records, 0 broken |
Isolated SQL benchmarks are median of 5 (500k-row inserts, PG 18.3). Upgrade and verification figures are from the live run, Amazon EKS 1.36 + Aurora Serverless v2, 2026-06-29.
The upgrade: about ten minutes of write-outage, then it self-heals
We set the cluster’s engine version to 18.3 with allow_major_version_upgrade against a populated database and watched the application from the outside. The write path returned 503 for a ~10.3-minute window; total upgrade wall time was about 11.5 minutes. Then the API recovered on its own, with no pod restart: the connection pool reconnected once Aurora came back, and the first post-recovery write succeeded.
The operational lesson is small and useful. Budget a ten-to-twelve minute write window and run the upgrade between workloads. You do not need to babysit the application through it, but you do need the window.
The trap: a column DEFAULT is pinned to a function’s OID
This is the part worth the post, and it is a general PostgreSQL gotcha, not a uuidv7 one. State it without uuidv7 and it sounds obvious in hindsight:
A stored column DEFAULT binds to a function’s OID at parse time. Adding a same-named native function in a later major version does not re-point existing defaults. They keep calling the old one.
Native uuidv7()is new in PostgreSQL 18. Anyone who adopted UUIDv7 on 15 through 17 installed a SQL polyfill (there are a handful of widely-copied gists). We had one too, in the public schema. A fresh 18 install never creates it and resolves uuidv7() straight to pg_catalog.uuidv7. An in-place upgrade does not: the polyfill persists, and every one of our 115 DEFAULT uuidv7() columns stayed bound to it. Inserts kept calling the polyfill, and row counts and “it still works” would never tell you.
The cleanup is where the OID pinning bites a second time. A naive DROP FUNCTION public.uuidv7() fails, because 115 column defaults depend on it. CASCADEwould “succeed” by stripping those defaults, which silently breaks every insert that relied on them. The correct order is to re-point each default first, which rebinds it to the native function, and only then drop the polyfill:
-- Re-point, then drop. The ALTER rebinds the default to pg_catalog.uuidv7;
-- the polyfill has no dependents left, so the DROP is safe.
ALTER TABLE public.records ALTER COLUMN id SET DEFAULT uuidv7();
ALTER TABLE public.audit_log ALTER COLUMN id SET DEFAULT uuidv7();
-- ... every other DEFAULT uuidv7() column ...
DROP FUNCTION public.uuidv7();We do not maintain that list by hand. The migration ships an idempotent, self-discovering block that only acts when both the native function and the polyfill exist, finds the bound columns itself, re-points them, and drops the polyfill. It is in our AWS runbook: Upgrading the Aurora engine (PostgreSQL 17 to 18).
What skipping the cleanup costs you
Isolated SQL benchmarks, 500k-row inserts, median of 5, on PostgreSQL 18.3:
| Operation | Native | Polyfill | Penalty |
|---|---|---|---|
| Function alone (1M calls) | ~396 ms | ~1042 ms | 2.6× slower |
| Heap insert, no index | ~1452 ms | ~1875 ms | ~29% slower |
| Insert with UUID primary key | ~3081 ms | ~3463 ms | ~12% slower |
| Primary-key index size | 15 MB | 20 MB | 33% bigger |
The function is 2.6× slower in isolation, but end to end the gap shrinks, because the insert (and, in our case, signing the record) dominates the call. Through our full signed-insert path the difference was directional: single-record throughput was flat within noise, bulk inserts ran roughly 6 to 12% faster on native (single run, rate limiter off to isolate the engine).
The cost that does not show up as latency is the 33% larger index in this run. The polyfill’s weaker intra-millisecond ordering packs the B-tree worse, so the primary-key index stays bloated until you reindex (which native ordering avoids needing): more storage and more buffer-cache pressure in the meantime. The gap scales with how many rows you write per millisecond, so treat 33% as our number, not a constant. Do not panic over the function’s microseconds, but if you are write-heavy, clean it up.
No scar tissue: for new writes, in-place performs like fresh
The obvious next worry: once cleaned up, does an in-place-upgraded database carry hidden baggage that a fresh install would not? We ran the same native primary-key insert benchmark (500k rows, 5 runs) in the upgraded database and in a brand-new database on the same cluster: ~2870 ms versus ~3081 ms. The upgraded database was, if anything, a hair faster, so call it no penalty.
For new data, an in-place major upgrade performs like a fresh install. The only legacy lives in pre-existing objects: the bound defaults above, and the optimizer statistics below. Not in the engine.
Re-analyze the statistics after the upgrade
After the upgrade our largest table had last_analyze = NULL, and autovacuum re-analyzed it about nine minutes later (upgrade completed ~20:32, autoanalyze fired ~20:41). Be careful what you infer from that: a major upgrade resets the last_analyze timestamp regardless of whether the actual histograms carried over, so a NULL on its own tells you no ANALYZE has run since the upgrade, not that the planner is blind.
PostgreSQL 18 improved pg_upgrade to carry statistics across a major upgrade, and Aurora’s mechanism may differ; we did not confirm whether the carried histograms survived for that table. So we did the boring, reliable thing: run ANALYZE yourself right after a major upgrade instead of waiting for autovacuum, especially on a busy database, and you do not have to wonder.
Aside: we could prove the upgrade did not corrupt anything
One pleasant side effect of what we build. Most stacks answer “did the major upgrade corrupt any data?” with “the application still works.” Because our records are hash-chained and Ed25519-signed, it is a checkable yes or no. After the upgrade we re-verified the entire vault, all 42,007 records, every chain link and signature, with zero breakage, and re-checked a sample offline against an out-of-band key (refusing the server’s own embedded key). It passed. A verifiable data structure answers the corruption question with a proof instead of a vibe.
FAQ
Why does my PostgreSQL major-version upgrade still use the old uuidv7 polyfill instead of the native function?
A column DEFAULT binds to a function OID at parse time. An in-place major upgrade does not re-point existing defaults, so they keep calling the polyfill. Re-point them with ALTER TABLE ... ALTER COLUMN ... SET DEFAULT uuidv7().
Why can I not DROP the uuidv7 polyfill after upgrading to PostgreSQL 18?
The column defaults still resolve to it, so they are dependent objects and a bare DROP FUNCTION fails. CASCADE would strip the defaults and break inserts. Re-point the defaults to the native function first, then drop.
How much downtime does an in-place Aurora 17 to 18 major upgrade take?
In our run, about a 10.3-minute write-outage and ~11.5 minutes wall time. The application self-recovered with no restart once the database returned.
Is the native PostgreSQL 18 uuidv7 faster than a SQL polyfill?
Yes: about 2.6× as a function in isolation, smaller end to end, and the polyfill also leaves a primary-key index about 33% larger.
Takeaways
- After an in-place major upgrade, audit every
DEFAULTthat calls a function you polyfilled. The default is OID-pinned and will keep calling the old one until you re-point it. - Re-point before you drop. A bare
DROP FUNCTIONfails on the dependents andCASCADEis a footgun. - The native uuidv7 win that matters most is not the function speed; it is the tighter, ~33%-smaller primary-key index.
- Budget a ten-to-twelve minute write window for an in-place Aurora major upgrade, and run
ANALYZEyourself afterward rather than waiting for autovacuum.
Sources & further reading
- PostgreSQL 18 release notes (native
uuidv7(), pg_upgrade statistics): postgresql.org - PostgreSQL documentation - UUID functions: postgresql.org
- PostgreSQL documentation - ALTER TABLE, column defaults and dependencies: postgresql.org
- RFC 9562 - UUID Version 7 (time-ordered): rfc-editor.org
- Our AWS install runbook with the idempotent re-point-and-drop block: Install AGLedger on Amazon EKS