序列器 Sequencer
我们需要序列器帮助我们在分布式环境下生成唯一的 ID 标记所有事件,这对于分析和 Debug 都是非常有用的工具。
Unique IDs are important for identifying events and objects within a distributed system. However, designing a unique ID generator within a distributed system is challenging.
Requirements for unique identifiers
- Uniqueness: We need to assign unique identifiers to different events for identification purposes.
- Scalability: The ID generation system should generate at least a billion unique IDs per day.
- Availability: Since multiple events happen event at the level of nanoseconds(1^-9), our system should generate IDs for all the events that occur.
- 64-bit numeric ID: We restrict the length to 64 bits because this bit size is enough for many years in the future. Let's calculate the number of years after wich our ID range will wrap around.
- Total numbers available = 2^64 = 1.8446744 x 10^19
- Estimated number of events per day = 1 billion = 10^9
- Number of events per year = 365 billion = 365×10^9
- Number of years to deplete identifier range = 2^64/(365×10^9) = 50,539,024.8595 years
- 64 bits should be enough for a unique ID length considering these calculations.
First solution: UUID
A straw man solution for our design uses UUIDs (universally unique IDs). This is a 128-bit number and it looks like 123𝑒4567𝑒89𝑏12𝑑3𝑎456426614174000123e4567e89b12d3a456426614174000
in hexadecimal. It gives us about 10^38 numbers. UUIDs have different versions. We opt for version 4, which generates a pseudorandom number.
Prons
Each server can generate its own ID and assign the ID to its respective event. No coordination is needed for UUID since it’s independent of the server. Scaling up and down is easy with UUID, and this system is also highly available. Furthermore, it has a low probability of collisions. The design for this approach is given below:
Cons
Using 128-bit numbers as primary keys makes the primary-key indexing slower, which results in slow inserts. A workaround might be to interpret an ID as a hex string instead of a number. However, non-numeric identifiers might not be suitable for many use cases. The ID isn’t of 64-bit size. Moreover, there’s a chance of duplication. Although this chance is minimal, we can’t claim UUID to be deterministically unique. Additionally, UUIDs given to clients over time might not be monotonically increasing. The following table summarizes the requirements we have fulfilled using UUID: