Skip to main content

Ordering events in distributed systems

·7 mins

In everyday life, we have a strong sense of time and cause-and-effect.

Consider frying an egg: one must get an egg, break the egg, add it to the frying pan, and then let it cook. Cooking the egg can only happen after the egg is broken, and we know that cooking happens at a later time than the breaking.

For us, the order things happen in and the time they take place at are intrinsically linked. It’s just intuitive!

Logical time #

It’s not that simple when it comes to distributed systems, where events or changes happen across multiple processes and machines. The regular system clock time could be different across each machine. Plus, there’s not even a guarantee that the system clocks will increase predictably at the same rate (for example, if the machine clock drifts and has to correct back to a past time).

In spite of all this, nodes across the system must agree somehow on the order of events. This is the concept of logical time. To track logical time, we’ll have to come up with logical timekeepers aka clocks, that capture this ordering.

Requirements of keeping logical time #

The implementation of a logical clock must assume that there is no central time source and that there are multiple processes/nodes – as the case naturally is with distributed systems.

The processes or nodes in the system may also communicate with each other.

Each approach also needs to ensure causal precedence holds true: “if event A caused event B, then event A happened before event B (according to our clock)”.

To clarify this a bit, cause/effect is only a thing when nodes are dependent on events from another node. A node that never communicates an event to another node cannot really “cause” something on another node–that is called a concurrent event. This all leads to a concept called partial ordering, where within the system we might not know the global order of events, but we can know the ordering for some events that depend on each other via cause/effect.

Let’s take a look at a few of the approaches to logical clocks!

Lamport timestamps #

Lamport timestamps are one of the earliest approaches to logical clocks, which Leslie Lamport came up with in 19781. Notably, they are scalar clocks.

The basis of this is that each node in the distributed system keeps a counter. Each time an event occurs on that node, the counter is incremented.

The other part of this is based on nodes passing messages to each other.

When a node messages another node, it includes the current timestamp (counter value) in the message. The receiving node checks the timestamp in the message. If the received timestamp is greater than the receiving node’s own, the node updates its own timestamp to be one above the timestamp in the message.

To actually determine the order events happened in, these timestamps are compared. For tiebreaking, we can use the timestamp that came from the lower node ID.

Here’s an example showing a three nodes in a system with Lamport timestamps:

%%{ init: { "sequence": { "mirrorActors": false } } }%% sequenceDiagram participant Node_A participant Node_B participant Node_C Note over Node_A,Node_C: all counters = 0 Note over Node_A: Event (counterA=1) Node_A->>Node_B: Message (value=1) Note over Node_B: Received (counterB --> 2) Note over Node_A: Event (counterA=2) Note over Node_A: Event (counterA=3) Note over Node_A: Event (counterA=4) Node_B->>Node_C: Message (value=2) Note over Node_C: Received (counterC --> 3) Node_C->>Node_A: Message (value=3) Note over Node_A: Received (counterA remains 4)

Where do Lamport timestamps fail? #

Scenario 1 #

Suppose a message is sent with the same counter value by two nodes at the same time to a different node.

In the Lamport world, the tiebreaking will make Node C pick Node A’s counter since Node A has an earlier identifier.

However, we have lost the causal relationship between Node B and C since the corresponding event was superseded by the message from Node A.

%%{ init: { "sequence": { "mirrorActors": false } } }%% sequenceDiagram participant Node_A participant Node_B participant Node_C Note over Node_A,Node_C: all counters = 0 Note over Node_A: Event (counterA=1) Note over Node_B: Event (counterB=1) par Simultaneous messaging Node_A->>Node_C: Message (value=1) and Node_B->>Node_C: Message (value=1) end Note over Node_C: Event (counterC=2)

Scenario 2 #

Suppose Node B has two events that update its counter and then later (in physical time), Node A has its first event.

Since Node A’s counter is lower, does that mean Node A’s event happened earlier than Node B’s latest event?

It’s tough to say!

%%{ init: { "sequence": { "mirrorActors": false } } }%% sequenceDiagram participant Node_A participant Node_B Note over Node_A,Node_B: all counters = 0 Note over Node_B: Event (counterB=1) Note over Node_B: Event (counterB=2) Note over Node_A: Event (counterA=1)

Vector clocks #

It’s a lot less clear who came up with vector clocks2, so we’ll rely on what Raynal and Singhal have collected3 based on previous research.

Where Lamport timestamps falter, vector clocks try to improve the “global” understanding of order across nodes.

Fundamentally, a vector clock introduces another dimension versus scalar clocks (like Lamport timestamps).

Each process/node in the system stores an integer for all the actors in the system. So in a three-node system, each node will have a vector of length 3 and the i-th element of the vector corresponds to the integer for the i-th node (as known by the current node).

Before a node (say Node K) performs an event, within its vector, it will increment the counter corresponding to itself (the k-th element) by some fixed number (typically 1).

Nodes will message each other with the sender’s full vector in the message.

The receiving node will update its own vector like so: for the i-th position, take the maximum of the i-th element of the sender’s vector and i-th element of the receiver’s own vector. The receiver does that for all positions in the vector. Then the receiver will increment the counter corresponding to itself (the k-th element) by some fixed number, as it if had its own event.

Let’s look at the following scenario.

%%{ init: { "sequence": { "mirrorActors": false } } }%% sequenceDiagram participant Node_A participant Node_B participant Node_C Note over Node_A,Node_C: Initial vectors: [0,0,0] Note over Node_A: Event ([1,0,0]) Note over Node_B: Event ([0,1,0]) Note over Node_C: Event ([0,0,1]) Note over Node_A,Node_C: Causal relationships preserved whether Node A or B messages first par Two messages Node_A->>Node_B: Message ([1,0,0]) and Node_C->>Node_B: Message ([0,0,1]) end Note over Node_B: Received ([1,3,1]) Note over Node_B: Event ([1,4,1]) Node_B->>Node_A: Message ([1,4,1]) Note over Node_A: Received ([2,4,1]) Note over Node_C: Event ([0,0,2])

We immediately see a few benefits. The ordering of each node’s own events is not lost when messaging back and forth–richer ordering with causal events. Also, as each node updates its own events when not messaging (i.e. concurrent), it still retains some partial knowledge of other node’s events.

Comparing vector timestamps #

Comparing vector timestamps is more meaningful and useful than comparing scalar timestamps, but it does mean a few more rules.

Vector timestamps are equal if the corresponding elements at a position are equal between both vectors.

One timestamp is less or equal than another if the first has every element is less than or equal to its corresponding element in the second vector.

  • For strict “less than” comparisons, it’s almost the same rule, but element-wise has to be strict “less than”

Downsides #

Without going into too much detail, the main drawbacks of vector clocks are:

  • Amount of space complexity increases by the number of nodes in the system. This affects both storage and the size of the messages that need to be sent.
  • They are still subject to partial ordering, where if two events are not causal, we cannot tell which happened before the other in physical time.

Closing remarks #

There’s been a lot of research and experimentation put into solving the event-ordering problem in distributed systems.

In general, it remains a problem that’s impossible to truly solve, but the approaches become easier to implement and more efficient.

That said, it’s important for a distributed system to decide on some strategy for event ordering, and it’s up to the need of the specific system (e.g. consistency guarantees) which logical clock / ordering approach to use.


  1. Leslie Lamport. 1978. Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21, 7 (July 1978), 558–565. https://doi.org/10.1145/359545.359563 ↩︎

  2. https://decomposition.al/blog/2023/04/08/who-invented-vector-clocks/ ↩︎

  3. Michel Raynal and Mukesh Singhal. 1996. Logical Time: Capturing Causality in Distributed Systems. Computer 29, 2 (February 1996), 49–56. https://doi.org/10.1109/2.485846 ↩︎