How IDs are generated in Penpot

Lets beging with some context

This post was originated from a conversation I had with the Taiga development team about how we generate ID’s in Penpot. The idea was to know what decisions we have made in this regard and understand the implementation details.

At Penpot, since its inception, we made the decision to use UUID to generate ID’s. If you are not familiar with the subject, you’ll find useful this good article on wikipedia that explains a
bit about UUID. The simplest rationale for generating IDs is that we needed to be able to create IDs on both the frontend and the backend. From the start, we already had a distributed ID generation scenario but a simple incremental in the database table was not enough for us.

The quick approach would have been to use the typical UUIDv4, but as this variant is mainly generated with random data it implies certain problems: it’s not sortable and causes fragmentation in the database indexes, thus degrading query performance over time. So we began by using UUIDv1.

What alternatives we considered before choosing UUID?

We also considered several alternatives before going via the UUID route:

  • ulid: The basic idea behind ulid is very interesting because it is very close to the final solution we have taken. The main problem it’s to do with the fact that if we want to store it efficiently, we would have to decode the value to two int64s and store it in the database in two separate fields, which would make all queries more cumbersome. I’m aware that, as an alternative, we can take advantage of the UUID type of the database to store the 128 bits of this ID there, but semantically it would be possibly confusing. Storing it in a string format is not an option, because that would imply having more data to store and therefore larger indexes.
  • sno: in the same line as ulid, implemented only in go
  • nanoid: very similar to UUID v4. Plus you need to save the value as a string, hence we discarded this option very quickly.

There are many custom implementations with different characteristics and there are no plan for list them all.

Clearly, our choice to use UUIDv1 doesn’t comes without its own problems: unlike v4, it’s partially sortable because it strangely places the least significant bits of timestamp at the start in binary encoding, unusual 100-ns precision gregorian epoch, and also it needs access to the MAC address among other less important things. As for our case, we expected to use it always in virtual environments, so the access to MAC was not so problematic. And at the frontend, we emulated the mac generating random data in the first use.

After the conversation, I’ve decided to review how we’re generating uuids and I’ve come across this new revision of the UUID RFC where the new 3 UUID formats are outlined: the v6, v7 and v8.

To summarize, we have v6 which is similar to v1 with an improved timestamp bits ordering, v7 (with its variants 1 and 2) mixes between v4 and v6 ,with a more standard epoch, precision in milliseconds combining it with a random part instead of the MAC address. And finally v8, which basically defines only the minimum necessary to preserve compatibility with the rest of the formats and leaves the rest to the choice of the custom implementation.

The current approach

I have chosen to take good ideas from ULID and UUID v6 and implement a custom one using the UUID v8 format. These are the properties of our new UUID implementation:

  • 48bits timestamp (milliseconds precision, custom epoch: 2022-01-01 00:00:00)
  • 14bits random clockseq (monotonically increasing on timestamp collision, allows 16384 ids/ms per host)
  • blocks (no exceptions raised) if clockseq overflows
  • 56bits of randomness, generated on library initialization
  • 4bits for user defined tag (allows distinguish between environments where the ID is generated)
  • don’t run out of range until 10941 AD
  • on clock regression, the 56bits of static randomness are reinitialized

We maintain compatibility with the current solution as we continue to use the same type of data that is efficiently stored in the database; have better visual readability; better sortability and greatly improved performance. You can take a look at the JVM implementation here and the JS implementation here.

Here below is a benchmark that illustrates the difference in performance between different implementations and versions:

;; This is a standard jdk way of generate UUID v4
user=> (perf/bench :f #(java.util.UUID/randomUUID) :name "uuid-v4-java")
=> benchmarking: uuid-v4-java
--> TOTAL: 9.95sec
--> MEAN:  668.6ns

;; This is the implementation we used previously
;; https://github.com/danlentz/clj-uuid
user=> (perf/bench :f #(clj-uuid/v1) :name "uuid-v1-clj")
=> benchmarking: uuid-v1-clj
--> TOTAL: 10.01sec
--> MEAN:  300.16ns

;; This is third-party java library that generates UUID v1
user=> (perf/bench :f #(UuidCreator/getTimeBased) :name "uuid-v1-java-uc")
=> benchmarking: uuid-v1-java-uc
--> TOTAL: 10.02sec
--> MEAN:  100.74ns

;; This is third-party java library that generates UUID v7 (canonical)
user=> (perf/bench :f #(UuidCreator/getTimeOrderedEpoch) :name "uuid-v7.0-java-uc")
=> benchmarking: uuid-v7.0-java-uc
--> TOTAL: 10.52sec
--> MEAN:  448.02ns

;; This is third-party java library that generates UUID v7 (variant 1)
user=> (perf/bench :f #(UuidCreator/getTimeOrderedEpochPlus1) :name "uuid-v7.1-java-uc")
=> benchmarking: uuid-v7.1-java-uc
--> TOTAL: 10.33sec
--> MEAN:  97.86ns

;; This is our in-house implementation of UUID v8
user=> (perf/bench :f #(app.common.UUIDv8/create) :name "uuid-v8-java-penpot")
=> benchmarking: uuid-v8-java-penpot
--> TOTAL: 10.09sec
--> MEAN:  89.26ns

Original post: niwi.nz : How IDs are generated in Penpot

2 Likes

I’m curious about your clj-uuid benchmarks as they seem to differ from mine. I added new criterium bench timings at GitHub - danlentz/clj-uuid: RFC9562 Unique Identifiers (v1,v3,v4,v5,v6,v7,v8,squuid) for Clojure to reflect the new release, although I’d also note that bench will give you only single threaded results and in something of worst case (pressuring the engine to stall with >10k UUIDs per ms).

Just a thought. New release of clj-uuid 0.2.0 but I am always interested in improving, if you have feedback.

Best!
Dan

1 Like

Hello @danlentz

Thanks for the reply, great to see clj-uuid with v7 support :smiley:

I’ve repeated the tests using criterium, with the latest version of clj-uuid on JDK 23 (+ZGC, should not significantly affect the result), and the results are as follows:

user=> (run-quick-bench' (clj-uuid/v1))
Evaluation count : 3042738 in 6 samples of 507123 calls.
nil
             Execution time mean : 197.286001 ns
    Execution time std-deviation : 2.497377 ns
   Execution time lower quantile : 195.621258 ns ( 2.5%)
   Execution time upper quantile : 201.324289 ns (97.5%)
                   Overhead used : 1.300203 ns

Found 1 outliers in 6 samples (16.6667 %)
        low-severe       1 (16.6667 %)
user=> (run-quick-bench' (clj-uuid/v7))
Evaluation count : 602424 in 6 samples of 100404 calls.
nil
             Execution time mean : 1.005780 µs
    Execution time std-deviation : 8.163028 ns
   Execution time lower quantile : 996.245239 ns ( 2.5%)
   Execution time upper quantile : 1.015041 µs (97.5%)
                   Overhead used : 1.300203 ns
user=> (run-quick-bench' (app.common.UUIDv8/create))
Evaluation count : 7937742 in 6 samples of 1322957 calls.
nil
             Execution time mean : 74.914179 ns
    Execution time std-deviation : 0.451792 ns
   Execution time lower quantile : 74.391168 ns ( 2.5%)
   Execution time upper quantile : 75.364965 ns (97.5%)
                   Overhead used : 1.300203 ns

All this, with the CPU limited to 1.6GHZ (to eliminate possible differences with the CPU boost and thermal throttling).

It is clear that the version of uuidv8 implemented in penpot has its own tradeoffs, and one of them is that it can generate a maximum of 16384 ids per millisecond and that the generation of IDs in multiple threads may suffer some overhead due to lock contention. But that is why it is a specific implementation for penpot, where those tradeoffs are perfectly acceptable.