8 min read · Guide · Distributed systems
How it works · Distributed systems

How distributed IDs stay unique without a central counter.

A primary key that nobody coordinates. Four formats — UUID v4, UUID v7, ULID, Snowflake — all solve the same problem differently. The bit layout is the difference.

Parts01–08 InteractiveBit-layout viewer PrereqBits / time

What are distributed IDs?

Globally unique, no central counter.

Distributed IDs are globally unique identifiers generated without a central counter. The four common formats are UUID (RFC 4122/9562), ULID (Stripe / Alizain Feerasta, 2016), Snowflake (Twitter, 2010), and KSUID (Segment, 2017). UUID v7, standardised in RFC 9562, is the default choice for time-ordered keys.

A single database's auto-increment is fine until you have two databases. Now coordinated counters are a distributed-systems problem; you either round-trip to a central authority for every ID (slow — a small transaction per ID) or accept some non-monotonicity (fast). Distributed IDs are the fast answer.

Three properties matter, often in tension: uniqueness (no collisions across nodes), orderability (newer IDs sort after older ones), and opacity (the ID reveals nothing it shouldn't). Different formats prioritize different combinations.


Compare the four formats bit by bit

UUID, ULID, and Snowflake, side by side.

Click a format to inspect its bit layout. The total length is the bar; each segment is sized in proportion to its bit count.

UUID v7 · 128 bits
unix_ms48 bits 48-bit ms timestamp version4 bits 7 rand_a12 bits random variant2 bits 10 rand_b62 bits randomBIT 0BIT 127

UUID v4's shape with the time prefix glued on. Sorts by creation time within a millisecond. The new modern default — Postgres 17 ships native v7. Indexes love it.

UUID v4
128b
random
UUID v7
128b
time-ordered
ULID
128b
time-ordered
Snowflake
64b
time-ordered coordinated

UUID v4: 128 random bits, six of them fixed

Unique, but with no order at all.

The classic. Generate 122 cryptographically-random bits, set 4 bits to "version 4" and 2 to "variant 10", and there's your ID. Collision probability is mathematically irrelevant — you would generate 1 billion per second for 85 years before the first collision.

The catch: no order. Two v4s a second apart, sorted lexicographically, don't have any relationship. Used as a primary key, every insert lands at a random page in the B-tree — destroying cache locality, causing constant page splits. This is why "use UUIDs as primary keys" used to be a known anti-pattern.


UUID v7: the new default for time-ordered keys

A time prefix glued onto UUID v4's shape.

RFC 9562 (May 2024) finalised UUID v7: 48 bits of millisecond timestamp at the front, then version + variant + 74 random bits. Sorts by time at lexicographic comparison — solving the index-locality problem above — while keeping the rest of UUID's collision math.

Postgres 17 ships native generation (uuidv7()). Most modern frameworks default to v7 now. If you're still on v4 for primary keys, switching to v7 is a one-line code change with measurable index performance gains.


ULID: time-ordered IDs before UUID v7

Same idea, with a cleaner text encoding.

Alizain Feerasta, 2016. Same idea as UUID v7 — 48-bit timestamp, 80-bit random, total 128 bits — with a different textual encoding: Crockford base32, 26 characters, no ambiguous letters (no I/L/O/U). The encoding is what people often pick ULID for: shorter, URL-safe, cleaner to read.

UUID v7 covers most of ULID's use cases now and benefits from the broader UUID ecosystem (drivers, libraries, tooling). Greenfield projects: pick v7. Existing ULID systems: leave alone — both are operationally fine.


Snowflake: when 128 bits is too many

64 bits that fit in a SQL bigint.

Twitter, 2010. 64 bits — fits in a SQL BIGINT, half the storage of a UUID. Layout: 1 sign bit + 41 bits of millisecond timestamp + 10 bits of machine ID + 12 bits of per-millisecond sequence. Roughly 4096 IDs per millisecond per machine, across 1024 machines — sharded with the same bit-cut you'd use for consistent hashing.

The catch is coordination. Each generator process must have a unique 10-bit machine ID, assigned externally — by Zookeeper, etcd, or operator config (the same problem service discovery solves more generally). If two boxes accidentally share a machine ID, they will collide. Variants (Sonyflake, Mastodon's flake) tweak the bit allocations.


Time-ordered IDs leak the creation time

Sometimes that is a privacy problem.

The ordering property cuts both ways. A v7 / ULID / Snowflake ID embeds the creation time, with millisecond precision, in the open. If your ID is in a public URL, anyone can read the millisecond when the resource was created.

For most apps that's fine — the creation time isn't secret. For some apps (private medical records, pre-launch invitations, anything time-sensitive) it's a leak. UUID v4 is the answer there: pure randomness, no inference. Use v4 when the creation time itself must not leak — the same way OAuth opaque tokens hide their issuance time.


How to choose a distributed ID format

It comes down to two questions.

One: do you need 128 bits or is 64 enough? Two: does your ID need to leak the creation time?

128b · time

UUID v7

The default for new systems. Sorts well, indexes well, no coordination required, full UUID ecosystem.

128b · opaque

UUID v4

Use when ID-borne time would leak something private. Accepts the index-locality cost in exchange for opacity.

64b · time

Snowflake

Use when storage doubling matters (high-volume tables, hot in-memory indexes). Pay the coordination cost: every generator needs a unique machine ID.


Where each ID format came from

Production systems and the ID format they invented.

Twitter Snowflake · 2010
Designed when Twitter migrated off MySQL auto-increments. 64 bits: 41-bit timestamp (ms since epoch), 10-bit machine ID, 12-bit sequence. Generates ~4M IDs/sec/node. Open-sourced; cloned by Discord (with modified epoch), Sony (Sonyflake — different bit split), and Instagram (with sharding identifier).
MongoDB ObjectId · 2009
96 bits: 4-byte timestamp + 5-byte random + 3-byte counter. The default _id for every MongoDB document. Time-sortable but smaller than UUID. Many JSON-document stores adopted similar formats.
Segment KSUID · 2017
160 bits: 32-bit timestamp + 128-bit random payload. Base62-encoded to 27 characters — URL-safe and human-shortest of the major formats. Segment used it for event tracking; copied by Stripe-style API IDs (ksuid_xxx).
ULID · 2016 (Alizain Feerasta)
128 bits: 48-bit timestamp + 80-bit random. Crockford-base32 encoded for case-insensitive readability. Effectively UUID v7 before the IETF spec; many systems migrated from ULID to UUID v7 once the latter standardised in 2024.
UUID v7 · RFC 9562, 2024
The official IETF time-ordered UUID. 48-bit Unix-ms timestamp + 74 random bits + version/variant marker. Drop-in replacement for v4 in any UUID-aware system. Postgres 16+ ships uuidv7(); most modern language SDKs added support in 2024.

The "Stripe ID" pattern. Stripe's IDs (cus_xxx, ch_xxx, pi_xxx) are not a single format — they're prefixed-then-base62. The prefix tells humans the type; the suffix is the actual unique ID (Stripe's internal generator is closer to KSUID). Many B2B APIs adopted this style after Stripe's; it's one of the most recognisable design choices in modern SaaS.


How ID format affects database performance

Why time-ordered IDs make B-trees happy.

The B-tree insertion problem. A random UUID v4 inserted into a B-tree index lands in a random leaf page; consecutive inserts touch different pages, blowing the buffer cache. A time-ordered ID (UUID v7, ULID, Snowflake) appends to the right edge of the index; inserts cluster in a small set of hot pages. The throughput difference is 2-5× on insert-heavy workloads.

Real measurements. Daniel Bartholomew (MariaDB) published in 2023: 1M-row insert into a UUID v4 PRIMARY KEY took 247 seconds; the same with UUID v7 took 76 seconds — a 3.2× speedup, with the only difference being ID format. Postgres' community benchmarks show similar 2-4× gains.

The other direction: time-ordered IDs leak time. A user enumerated by ID can be ordered by signup time. For public IDs (e.g. /orders/<id>) this can be a privacy issue — competitors can estimate your order volume per day from the ID gap. Stripe, GitHub, and AWS all use opaque or hash-prefixed IDs for public surfaces precisely to avoid this.

The pragmatic compromise: use a time-ordered ID internally for performance, expose an opaque slug or hashed-ID externally. Many production systems do exactly this — Pinterest's pin IDs are an external base58-encoded hash of an internal Snowflake; YouTube video IDs are similar.



A closing note

Distributed IDs feel like an obscure topic until you realise every record in every system you build has one. Pick the format consciously, once, at the design stage. Reverse-engineering it later is brutal. The bit layouts above are the entire mental model — the rest is library choice.

Found this useful?