Share tokens instead of IDs

2026-05-26

When tracking a collection of entities in a relational database, it's common to use sequential identifiers (IDs) as the primary key, starting from 1 and counting up. These IDs are fine for uniquely identifying an entity in a local database, or upholding a foreign key relationship with other entities in the same database, but start causing problems once they cross a service boundary:

  • They're ambiguous. IDs just look like numbers. A database error log mentions 1779590843... is that an order or an invoice? It kinda looks like a timestamp.
  • They expose your internals. A customer who receives Invoice 0038 immediately knows how many invoices you've sent. Anyone with this information can reason about your data structure, and it may be possible to use enumeration attacks to access records belonging to other customers.
  • They're bound to a single source. ID sequences are tightly bound to a single entity table in a database. Merge entities, or split a monolith into new services, and the same identifier may exist in multiple sources with no way to tell them apart.
Two service domains, each owning its own entity tables.

Let's take a simple e-commerce example, with two services. We have an Orders Service, which serves as the domain for Order and Payment entities. We have the Deliveries Service, which serves as the domain for Delivery entities.

Integer IDs are fine for primary and foreign keys in a single service...
....but they become less valuable bridging between services, as seen here.
  1. 🧑Customer 9032
  2. im still waiting for 3044 from 2185
  3. Sure. I'll need the delivery and order IDs.
  4. i just told you!
Outside of these domains, these IDs completely lose their meaning.

Given how poorly raw IDs travel across service boundaries, a better approach is to share tokens. Generate tokens that are externally-sharable, and do not leak information about the entity storage. This article shares a strategy for using sequential entity IDs to generate compact, non-enumerable, type-prefixed tokens.

Order ID Order Token
1 O_Q4BDJQ
2 O_ZCXQ8F
3 O_2PN7M8
1000 O_WHE1FY
1073741823 O_VNT7CW
Instead, we prefer to share tokens...
  1. 🧑Customer C_7XEV43
  2. im still waiting for D_1hj4ln from O_32onOi
  3. Sure, let me look up your delivery and order.
...which makes everything much clearer.

These tokens are:

  • Ciphered. Sequential IDs result in completely different tokens.
  • Encoded. The compact format uses non-ambiguous characters, and is forgiving in its decoding process.
  • Prefixed. A known prefix tracks the entity type. Even with no other context, D_NM0X1X is clearly a Delivery token, and O_Q4BDJQ is clearly an Order token.

These tokens are NOT:

  • Secure. The ciphering and encoding process obfuscates a numeric ID into a compact token, but the process can be reversed and doesn't meaningfully hide the underlying ID.
  • A replacement for IDs. We don't want to rely on deriving IDs from tokens, in case we change the process in the future to support multiple data sources.

Keen to try it out? Use the generator below, or check out the library code at GitHub: TassSinclair/tokens.

C_71VHDH
Try generating a token with your own inputs.

Worked example: Token lookups

Consider the example above, where a support agent is investigating a customer's delivery issue:

  1. 🧑Customer C_7XEV43
  2. im still waiting for D_1hj4ln from O_32onOi
  3. Sure, let me look up your delivery and order.
We start with a Delivery Token and Order Token.
Service calls quietly canonicalise the tokens and return the correct entities.
  1. 🧑Customer C_7XEV43
  2. I see Order O_320N01 is still open, Delivery D_1HJ41N is due to arrive on Wednesday.
  3. ok thanks!
The support agent can respond with minimal overhead.

In this example, entity IDs are not shared outside of the core domain, and service APIs rely on tokens. Tokens are used in customer-facing artefacts, such as invoices.

Why tokens, versus UUIDs?

UUIDs also avoid enumeration and information leakage, but at 36 characters they're hard to communicate verbally and carry no type information. They're still valuable for uniquely identifying entity records, but similar to IDs, should not be handled outside of the service that owns the entities.

How does this work?

Ciphering, encoding, and prefixing each operate as independent steps chained in sequence. Let's take customer entities as an example.

Customer ID Cipher
(seed 41434539)
Encoding Prefixed
Customer token
Canonicalisation
(example)
1 236832177 71VHDH C_71VHDH C_7iuhdh
2 55488065 1MXBJ1 C_1MXBJ1 C_imxbjl
3 1065536837 ZR5KA5 C_ZR5KA5 C_zr5ka5
10 185554103 5GYN5Q C_5GYN5Q C_5gyn5q
1000 346451463 AACVG7 C_AACVG7 C_aacvg7
53765282 2 000002 C_000002
379690282 1 000001 C_000001 C_00000i
1073741823 421086739 CHJHGK C_CHJHGK C_ckjhgk
Converting Customer IDs to Customer tokens, and canonicalising Customer Tokens from user input.

Ciphering

In a process where a sequence of customers with IDs [ 100, 101, 102 ] would normally map to customer tokens [ C_000034, C_000035, C_000036 ], it's trivial to start guessing nearby tokens. We can mask this relationship by using a substitution cipher to obscure the ID before converting it into a token.

We want a deterministic substitution cipher that uses the same input and output domain. This kind of cipher takes a number in a range, and maps it to somewhere else in the same range. To visualise an example, the Caesar cipher shifts the input "left" by a certain distance, wrapping around to the end of the domain:

Caesar cipher

Domain
Left shift
Try different domain sizes and left-shift values.

Feistel cipher

In practice, we want a cipher that scrambles the mapping, so we use a Feistel cipher to encrypt the integer ID before the Base32 encoding step. A Feistel network provides diffusion, where a one-bit change in the input affects all bits of the output.

Feistel cipher

Domain
Seed
Try different domain sizes and seed values.

For our customer tokens, we've set the "domain" space as 326, which matches our imposed token constraint of six Base32 characters. We can set a different seed for each entity type, so they follow different token sequences. The "seed" is not a secret; it is hard-coded for each token type to ensure consistent results.

Customer ID Cipher
(seed 41434539)
Cipher
(seed 41434540)
1 236832177 82650702
2 55488065 773480176
3 1065536837 592394359
10 185554103 1064651039
1000 346451463 6045569
53765282 2 874379223
379690282 1 285382121
1073741823 421086739 1063701256
Ciphering masks the relationship between IDs and tokens

See GitHub: TassSinclair/tokens/FeistelCipher.kt for full implementation details.

Encoding

Base32 encoding lets us represent identifiers with a larger set of human-parsable characters. We use Douglas Crockford's Base32 implementation, which maps Base10 integers (0123456789) to a Base32 range (0123456789ABCDEFGHJKMNPQRSTVWXYZ). Note that we've constrained the domain to six Base32 characters, so the maximum ID value supported is 326 - 1 = 1,073,741,823.

  • When converting an ID to a token, this step also left-pads the resulting token, so all tokens become the same length.
  • When parsing a token from input, this step canonicalises confusing and lowercased characters into their formal representation (for example, i and l are canonicalised to 1).
ID Encoding
without cipher
Canonicalisation
(example)
1 000001 00000i
2 000002
3 000003
10 00000A 00000a
1000 0000Z8 0000z8
1073741823 ZZZZZZ ZzzZzz
Encoding IDs as fixed-width Base32 tokens
C_71VHDH
Try canonicalising a prefixed token.

Encoding makes tokens safer to communicate over lossy media, such as over the phone, or scribbled on pieces of paper. One last step reduces confusion further.

Prefixing

Finally, each entity is represented by a short prefix, such as "C" for customer, "O" for order, or "INV" for invoice. In code, we create a token type for each of these, which also hard codes the cipher seed.

This gives us instant type identification, reduces the risk of accidentally using the wrong token in the wrong place, and ensures tokens from different types never collide even if they share an underlying ID.

Entity-scoped tokens, with prefix, token length and cipher seed constants.

In previous examples, we've set the cipher domain and token size to six Base32 characters, allowing us to represent IDs up to 326 - 1 = 1,073,741,823. But this is an arbitrary limit. Tokens with seven or eight Base32 characters would push the ID ceiling higher.

With entity-specific prefixes, lengths, and cipher seeds, see how the same ID sequence is represented against our example token types:

Example ID Prefixed
Customer token
Prefixed
Order token
Prefixed
Invoice token
1 C_71VHDH O_Q4BDJQ INV_ZJP9WNJA
2 C_1MXBJ1 O_ZCXQ8F INV_M9GZ29A2
3 C_ZR5KA5 O_2PN7M8 INV_D9MEBYC4
10 C_5GYN5Q O_W88WFZ INV_FE3T0SS5
1000 C_AACVG7 O_WHE1FY INV_4PGCFN8B
1073741823 C_CHJHGK O_VNT7CW INV_8R95M5RK
1099511627775 (out of bounds) (out of bounds) INV_9G26G99K
Token prefixes, seeds, and lengths result in very different mappings.

Bonus: Type safety

As an added benefit, these strongly-typed entity tokens improve type safety in our code. Consider the method signatures below:

fun getDeliveryForOrder(deliveryToken: String, orderToken: String) // these could be anything!

fun getDeliveryForOrder(deliveryToken: Token, orderToken: Token) // better, but still possible to switch them.

fun getDeliveryForOrder(deliveryToken: DeliveryToken, orderToken: OrderToken) // best, mistakes are caught immediately.

End-to-end considerations

Each of the steps above becomes part of the process in converting an ID to a token.

Now we have a solid understanding of how each step works, let's review some end-to-end considerations.

Security

Tokens solve an operational problem, not a security problem. The ciphering step prevents casual inference: A customer seeing C_71VHDH would not easily guess how many customers signed up before them, or that C_1MXBJ1 belongs to the next customer.

This cipher is fully reversible. Anyone with the Feistel seed can recover the original integer ID from any token. The seed is hard-coded as a configuration constant, visible in source code and deployment artefacts, and shouldn't be treated as a cryptographic secret.

Abstraction

Tokens don't replace IDs as the internal identifier. If the token conversion process changes (for example, if a service migrates from sequential integer IDs to UUIDs), persisted tokens remain stable.

Consider a situation where a second customer database is brought into scope, using UUIDs as primary keys rather than sequential integers. Both databases could represent customer entities using CustomerTokens, with a different output format to avoid clashes with the original format. Code that accepts a CustomerToken doesn't need to know which source a given token came from, as long as it matches one of the expected formats.

  • Customer entities with integer IDs use the process discussed above, and end up with six-character Base32-encoded tokens.
  • Customer entities with UUIDs use a different process (for example, with uuid-base58), and end up with 22-character Base58-encoded tokens.
Source Key Customer token
DB 1 (integer ID) 10 C_5GYN5Q
DB 2 (UUID) 550e8400-e29b-41d4-a716-446655440000 C_3NRpJeT1HaqFxQ5dJRaZBG
Token types can span multiple sources; callers only ever see a CustomerToken.

Persistence also allows individual tokens to be assigned directly, without going through the cipher at all. In test environments you often want predictable, readable tokens for known fixtures, rather than whatever the cipher happens to produce for a given ID:

INSERT INTO customers (token, name, ...) VALUES ('C_TEST_SENDER', 'Sandy Sender', ...);

Error detection

Crockford's Base32 specification defines an optional check symbol that can be appended to any encoded value, which is computed as value mod 37, mapped to one of 37 symbols (the standard 32 characters plus five extras: *~$=U). This catches the most common transcription mistakes, such as a single wrong character, or two adjacent characters swapped. We can try this out using the generator from before:

C_71VHDHR
The check symbol is the final character in the token.

Appending a check symbol lengthens the token by one character, but means a service entrypoint can reject malformed tokens during validation, instead of performing a database lookup that will never match:

C_71VHDH
The check symbol helps us validate tokens easily.

Check symbols are an optional extension. Use them if you need to communicate tokens over the phone, or on handwritten notes. Skip them when tokens are only ever generated and consumed programmatically.

In conclusion

This approach works most effectively in systems where entity identifiers cross service boundaries, or appear in customer-facing contexts.

  • Ciphering makes tokens visually distinct, so people are less likely to confuse sequential entities.
  • Encoding makes tokens more visually compact, and easier to communicate.
  • Prefixes reduce token ambiguity when working with multiple entity types.

Keep IDs doing what they do best - uniquely identifying entity records in a database, and enforcing referential integrity between entities. Outside of the service boundary, prefer tokens. For further reading, check out my reference implementation at GitHub: TassSinclair/tokens. This is the approach we use in OmniTabz systems, inspired by the Base32 tokens we used at Cash App.


If you have feedback or questions about this article, let's catch up via Mastodon, LinkedIn, or email.

Now (2026-05-26)Share tokens instead of IDsAutomating automationsSmartifying your devicesCHCon 2025 Badge ChallengeLEGO: Tangara (2025)