Skip to main content

The CAP theorem, and what does Google Spanner have to do with it?

·6 mins

In 2012, Google came up with a new globally-distributed database called Spanner. Inventing a distributed database is cool, but it had been done before. What was different about Google’s idea is that the method shook fundamental understanding–namely, the CAP theorem.

There was a lot of buzz around Spanner breaking the CAP theorem, which is a strong claim. What did this mean for the future of software systems and were there far-reaching implications of this?

The software world has had a decade since then to ponder this question. Let’s try to trace these steps.

What is the CAP theorem? #

Eric Brewer introduced this idea in a talk on distributed systems in 20001. Then a few years later, Seth Gilbert and Nancy Lynch at MIT formally2 proved it and Brewer’s conjecture became a theorem!

Brewer’s theorem, or the CAP theorem as it’s commonly known, asserts that a system with shared, distributed data can have at most two of these three properties: Consistency, Availability, Partition tolerance.

The CAP properties #

Let’s briefly describe the three properties. These are in the context of a distributed data store, which may have multiple nodes.

Suppose each read or write gets or modifies a particular piece of information. Then:

  • Consistency: All read requests will reflect the data of the latest write, or will error
  • Availability: All read requests will return data
  • Partition tolerance: When there is a network partition3 in the system, the system can still process requests

Partition-tolerant and consistent (PC) #

These sorts of systems generally aim for strong consistency, though this could increase latency. They may have strong consensus protocols and try to ensure a majority (or even all) nodes receive writes. This is a good choice for financial or medical systems that require strict data correctness.

%%{ init: { "sequence": { "mirrorActors": false } } }%% sequenceDiagram actor U1 as User 1 box System participant N1 as Node 1 participant N2 as Node 2 end actor U2 as User 2 rect rgba(255, 247, 217) U1->>N1: Write Data (key=x, value=77) N1-->N2: Data propagates U2->>N2: Read Data (key=x) N2-->>U2: Success! Value is 77 break N1-->N2: Network partition occurs end U2->>N2: Write Data (key=y, value=42) U1->>N1: Read Data (key=y) N1-->>U1: Error: Unable to ensure consistency end

When a network partition occurs, nodes will no longer know for sure if they have the latest data. They will respond to read requests with an error to avoid sending stale data.

Partition-tolerant and available (PA) #

Systems of this kind try to guarantee that some data is available for readers. They are much more commonly used when serving a high volume of requests at low latency is important. This is quite useful for real-time communication / messaging systems or streaming operations.

%%{ init: { "sequence": { "mirrorActors": false } } }%% sequenceDiagram actor U1 as User 1 box System participant N1 as Node 1 participant N2 as Node 2 end actor U2 as User 2 rect rgba(255, 247, 217) U1->>N1: Write Data (key=x, value=60) N1-->N2: Data propagates U2->>N2: Read Data (key=x) N2-->>U2: Success! Value is 60 break N1-->N2: Network partition occurs end U2->>N2: Write Data (key=x, value=55) U1->>N1: Read Data (key=x) N1-->>U1: Success! Value is 60 end

After a network partition, read requests will still be able to get data, though that data may be outdated.

Available and consistent (AC) #

This is a variation not frequently discussed–mostly because it’s not really viable for real use. One way this might happen is if there is just a single node–though that’s no longer a distributed system.

Alternatively, suppose we have at least two nodes. While there is no network partition, the system can reasonably aim for consistent data and availability of that data.

%%{ init: { "sequence": { "mirrorActors": false } } }%% sequenceDiagram actor U1 as User 1 box System participant N1 as Node 1 participant N2 as Node 2 end actor U2 as User 2 activate N1 activate N2 rect rgba(255, 247, 217) U1->>N1: Write Data (key=x, value=60) N1-->N2: Data propagates U2->>N2: Read Data (key=x) N2-->>U2: Success! Value is 60 break N1-->N2: Network partition occurs end deactivate N1 Note over N1: Disconnect Node 1 from the system end

Since this system is not partition tolerant, when the network partition occurs, the system might just disconnect or disable nodes. Essentially, it can’t deal with the partition so it tries to avoid even having a partition to cross over.

Spanner and CAP #

Let’s revisit Spanner! Google described it in this paper, which claims that Spanner is globally distributed with high availability and strong consistency.

That sounds great–it almost magically offers a trifecta of properties (availability, consistency, partition tolerance).

But per the CAP theorem, that shouldn’t be possible, right?

Interestingly, Eric Brewer himself weighed in on the matter. In a Google whitepaper4, Brewer confirms a few things:

  • Spanner is actually consistent in the vein of CAP
  • network partitions can happen, and Spanner forfeits availability then

Those facts technically make Spanner a consistent and partition-tolerant system.

The actual magic lies in the details of probabilities. Brewer affirms that as perceived by the system users, Spanner has practically very high availability. Also, when Spanner has an outage, it is very unlikely because of a network partition–or rather, network partitions rarely have an effect in the grande scheme.

How does Google achieve those odds? It runs its own private global network using controlled routers and links, along with redundant equipment in data centers. This whole network is administered in a way to almost prevent partitions outright.

Concluding thoughts #

It’s vital to test and re-check whether prior assumptions and understanding of the world still hold. With Google’s Spanner, the software world had a chance to do just that.

In this case, it mostly re-affirmed the CAP theorem. So from a theoretical perspective, nothing really changed. However, it helped show that it’s possible to innovate to the point where the system feels like a magical breakthrough in practice.


  1. Eric A. Brewer. 2000. Towards robust distributed systems (abstract). In Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing (PODC ‘00). Association for Computing Machinery, New York, NY, USA, 7. https://doi.org/10.1145/343477.343502 ↩︎

  2. Seth Gilbert and Nancy Lynch. 2002. Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News 33, 2 (June 2002), 51–59. https://doi.org/10.1145/564585.564601 ↩︎

  3. A network partition is when some nodes within the system cannot necessarily communicate with other nodes. This may be because of a connection loss or different nodes being grouped in different sub-networks. ↩︎

  4. E. Brewer, “Spanner, TrueTime and the CAP Theorem”. ↩︎