
Share tokens instead of IDs
When tracking a collection of entities in a relational database, it's common to use sequential
identifiers (IDs) as the primary key, starting from
1 and counting up. These IDs are fine for uniquely identifying an
entity in a local database, or upholding a foreign key relationship with other entities in the
same database, but start causing problems once they cross a service boundary:
- They're ambiguous. IDs just look like numbers. A database error log
mentions
1779590843... is that an order or an invoice? It kinda looks like a timestamp. - They expose your internals. A customer who receives Invoice
0038immediately knows how many invoices you've sent. Anyone with this information can reason about your data structure, and it may be possible to use enumeration attacks to access records belonging to other customers. - They're bound to a single source. ID sequences are tightly bound to a single entity table in a database. Merge entities, or split a monolith into new services, and the same identifier may exist in multiple sources with no way to tell them apart.
Let's take a simple e-commerce example, with two services. We have an Orders Service, which serves as the domain for Order and Payment entities. We have the Deliveries Service, which serves as the domain for Delivery entities.
- 🧑Customer 9032
- im still waiting for 3044 from 2185
- Sure. I'll need the delivery and order IDs.
- i just told you!
Given how poorly raw IDs travel across service boundaries, a better approach is to share tokens. Generate tokens that are externally-sharable, and do not leak information about the entity storage. This article shares a strategy for using sequential entity IDs to generate compact, non-enumerable, type-prefixed tokens.
| Order ID | → | Order Token |
|---|---|---|
| 1 | O_Q4BDJQ | |
| 2 | O_ZCXQ8F | |
| 3 | O_2PN7M8 | |
| 1000 | O_WHE1FY | |
| 1073741823 | O_VNT7CW |
- 🧑Customer C_7XEV43
- im still waiting for D_1hj4ln from O_32onOi
- Sure, let me look up your delivery and order.
These tokens are:
- Ciphered. Sequential IDs result in completely different tokens.
- Encoded. The compact format uses non-ambiguous characters, and is forgiving in its decoding process.
- Prefixed. A known prefix tracks the entity
type. Even with no other context,
D_NM0X1Xis clearly a Delivery token, andO_Q4BDJQis clearly an Order token.
These tokens are NOT:
- Secure. The ciphering and encoding process obfuscates a numeric ID into a compact token, but the process can be reversed and doesn't meaningfully hide the underlying ID.
- A replacement for IDs. We don't want to rely on deriving IDs from tokens, in case we change the process in the future to support multiple data sources.
Keen to try it out? Use the generator below, or check out the library code at GitHub: TassSinclair/tokens.
C_71VHDHWorked example: Token lookups
Consider the example above, where a support agent is investigating a customer's delivery issue:
- 🧑Customer C_7XEV43
- im still waiting for D_1hj4ln from O_32onOi
- Sure, let me look up your delivery and order.
- 🧑Customer C_7XEV43
- I see Order O_320N01 is still open, Delivery D_1HJ41N is due to arrive on Wednesday.
- ok thanks!
In this example, entity IDs are not shared outside of the core domain, and service APIs rely on tokens. Tokens are used in customer-facing artefacts, such as invoices.
Why tokens, versus UUIDs?
UUIDs also avoid enumeration and information leakage, but at 36 characters they're hard to communicate verbally and carry no type information. They're still valuable for uniquely identifying entity records, but similar to IDs, should not be handled outside of the service that owns the entities.
How does this work?
Ciphering, encoding, and prefixing each operate as independent steps chained in sequence. Let's take customer entities as an example.
| Customer ID | → | Cipher (seed 41434539) | → | Encoding | → | Prefixed Customer token | ← | Canonicalisation (example) |
|---|---|---|---|---|---|---|---|---|
1 | 236832177 | 71VHDH | C_71VHDH | C_7iuhdh | ||||
2 | 55488065 | 1MXBJ1 | C_1MXBJ1 | C_imxbjl | ||||
3 | 1065536837 | ZR5KA5 | C_ZR5KA5 | C_zr5ka5 | ||||
10 | 185554103 | 5GYN5Q | C_5GYN5Q | C_5gyn5q | ||||
1000 | 346451463 | AACVG7 | C_AACVG7 | C_aacvg7 | ||||
53765282 | 2 | 000002 | C_000002 | |||||
379690282 | 1 | 000001 | C_000001 | C_00000i | ||||
1073741823 | 421086739 | CHJHGK | C_CHJHGK | C_ckjhgk |
Ciphering
In a process where a sequence of customers with IDs
[ 100, 101, 102 ] would normally map to customer tokens
[ C_000034, C_000035, C_000036 ], it's trivial to start guessing
nearby tokens. We can mask this relationship by using a substitution cipher to obscure the ID
before converting it into a token.
We want a deterministic substitution cipher that uses the same input and output domain. This kind of cipher takes a number in a range, and maps it to somewhere else in the same range. To visualise an example, the Caesar cipher shifts the input "left" by a certain distance, wrapping around to the end of the domain:
Caesar cipher
Feistel cipher
In practice, we want a cipher that scrambles the mapping, so we use a Feistel cipher to encrypt the integer ID before the Base32 encoding step. A Feistel network provides diffusion, where a one-bit change in the input affects all bits of the output.
Feistel cipher
For our customer tokens, we've set the "domain" space as
326, which matches our imposed token constraint of six Base32 characters. We can set a different
seed for each entity type, so they follow different token sequences. The "seed" is not a
secret; it is hard-coded for each token type to ensure consistent results.
| Customer ID | Cipher (seed 41434539) | Cipher (seed 41434540) |
|---|---|---|
1 | 236832177 | 82650702 |
2 | 55488065 | 773480176 |
3 | 1065536837 | 592394359 |
10 | 185554103 | 1064651039 |
1000 | 346451463 | 6045569 |
53765282 | 2 | 874379223 |
379690282 | 1 | 285382121 |
1073741823 | 421086739 | 1063701256 |
See GitHub: TassSinclair/tokens/FeistelCipher.kt for full implementation details.
Encoding
Base32 encoding lets us represent
identifiers with a larger set of human-parsable characters. We use
Douglas Crockford's Base32 implementation,
which maps Base10 integers (0123456789) to a Base32 range (0123456789ABCDEFGHJKMNPQRSTVWXYZ). Note that we've constrained the domain to six Base32 characters, so the maximum ID value
supported is
326 - 1 = 1,073,741,823.
- When converting an ID to a token, this step also left-pads the resulting token, so all tokens become the same length.
-
When parsing a token from input, this step canonicalises confusing and lowercased characters
into their formal representation (for example,
iandlare canonicalised to1).
| ID | → | Encoding without cipher | ← | Canonicalisation (example) |
|---|---|---|---|---|
1 | 000001 | 00000i | ||
2 | 000002 | |||
3 | 000003 | |||
10 | 00000A | 00000a | ||
1000 | 0000Z8 | 0000z8 | ||
1073741823 | ZZZZZZ | ZzzZzz |
C_71VHDH Encoding makes tokens safer to communicate over lossy media, such as over the phone, or scribbled on pieces of paper. One last step reduces confusion further.
Prefixing
Finally, each entity is represented by a short prefix, such as "C" for customer, "O" for order, or "INV" for invoice. In code, we create a token type for each of these, which also hard codes the cipher seed.
This gives us instant type identification, reduces the risk of accidentally using the wrong token in the wrong place, and ensures tokens from different types never collide even if they share an underlying ID.
In previous examples, we've set the cipher domain and token size to six Base32 characters,
allowing us to represent IDs up to
326 - 1 = 1,073,741,823. But this is an arbitrary limit. Tokens with seven or eight Base32 characters would push the
ID ceiling higher.
With entity-specific prefixes, lengths, and cipher seeds, see how the same ID sequence is represented against our example token types:
| Example ID | Prefixed Customer token | Prefixed Order token | Prefixed Invoice token |
|---|---|---|---|
1 | C_71VHDH | O_Q4BDJQ | INV_ZJP9WNJA |
2 | C_1MXBJ1 | O_ZCXQ8F | INV_M9GZ29A2 |
3 | C_ZR5KA5 | O_2PN7M8 | INV_D9MEBYC4 |
10 | C_5GYN5Q | O_W88WFZ | INV_FE3T0SS5 |
1000 | C_AACVG7 | O_WHE1FY | INV_4PGCFN8B |
1073741823 | C_CHJHGK | O_VNT7CW | INV_8R95M5RK |
1099511627775 | (out of bounds) | (out of bounds) | INV_9G26G99K |
Bonus: Type safety
As an added benefit, these strongly-typed entity tokens improve type safety in our code. Consider the method signatures below:
fun getDeliveryForOrder(deliveryToken: String, orderToken: String) // these could be anything!
fun getDeliveryForOrder(deliveryToken: Token, orderToken: Token) // better, but still possible to switch them.
fun getDeliveryForOrder(deliveryToken: DeliveryToken, orderToken: OrderToken) // best, mistakes are caught immediately.End-to-end considerations
Now we have a solid understanding of how each step works, let's review some end-to-end considerations.
Security
Tokens solve an operational problem, not a security problem. The ciphering step
prevents casual inference: A customer seeing C_71VHDH would not
easily guess how many customers signed up before them, or that
C_1MXBJ1 belongs to the next customer.
This cipher is fully reversible. Anyone with the Feistel seed can recover the original integer ID from any token. The seed is hard-coded as a configuration constant, visible in source code and deployment artefacts, and shouldn't be treated as a cryptographic secret.
Abstraction
Tokens don't replace IDs as the internal identifier. If the token conversion process changes (for example, if a service migrates from sequential integer IDs to UUIDs), persisted tokens remain stable.
Consider a situation where a second customer database is brought into scope, using UUIDs as
primary keys rather than sequential integers. Both databases could represent customer entities
using
CustomerTokens, with a different output format to avoid clashes
with the original format. Code that accepts a
CustomerToken doesn't need to know which source a given token came
from, as long as it matches one of the expected formats.
- Customer entities with integer IDs use the process discussed above, and end up with six-character Base32-encoded tokens.
- Customer entities with UUIDs use a different process (for example, with uuid-base58), and end up with 22-character Base58-encoded tokens.
| Source | Key | Customer token |
|---|---|---|
| DB 1 (integer ID) | 10 | C_5GYN5Q |
| DB 2 (UUID) | 550e8400-e29b-41d4-a716-446655440000 | C_3NRpJeT1HaqFxQ5dJRaZBG |
CustomerToken.
Persistence also allows individual tokens to be assigned directly, without going through the cipher at all. In test environments you often want predictable, readable tokens for known fixtures, rather than whatever the cipher happens to produce for a given ID:
INSERT INTO customers (token, name, ...) VALUES ('C_TEST_SENDER', 'Sandy Sender', ...);Error detection
Crockford's Base32 specification defines
an optional check symbol that can be appended to any encoded value, which is computed as
value mod 37, mapped to one of 37 symbols (the standard 32
characters plus five extras: *~$=U). This
catches the most common transcription mistakes, such as a single wrong character, or two
adjacent characters swapped. We can try this out using the generator from before:
C_71VHDHRAppending a check symbol lengthens the token by one character, but means a service entrypoint can reject malformed tokens during validation, instead of performing a database lookup that will never match:
C_71VHDH Check symbols are an optional extension. Use them if you need to communicate tokens over the phone, or on handwritten notes. Skip them when tokens are only ever generated and consumed programmatically.
In conclusion
This approach works most effectively in systems where entity identifiers cross service boundaries, or appear in customer-facing contexts.
- Ciphering makes tokens visually distinct, so people are less likely to confuse sequential entities.
- Encoding makes tokens more visually compact, and easier to communicate.
- Prefixes reduce token ambiguity when working with multiple entity types.
Keep IDs doing what they do best - uniquely identifying entity records in a database, and enforcing referential integrity between entities. Outside of the service boundary, prefer tokens. For further reading, check out my reference implementation at GitHub: TassSinclair/tokens. This is the approach we use in OmniTabz systems, inspired by the Base32 tokens we used at Cash App.
If you have feedback or questions about this article, let's catch up via Mastodon, LinkedIn, or email.

All articles
About Sinclair Studios