Skip to main content

Approaches to load balancing

·5 mins

Suppose you’re building a sports streaming service.

You want to be able to scale the system, so you set it up across a bunch of nodes. One weekend, a huge international tournament happens. Say the expected peak number of viewers is about 10 million. You think, “sure that’s fine, you have 30 nodes and each node can handle up to 500,000 streams.”

On the big day, you’re watching the system as more people connect to watch the tournament. It starts out fine, but you notice that some of the nodes are rapidly approaching 500,000 viewers each. Not too surprising, but those numbers keep climbing while a bunch of other nodes are sitting at less than 200,000 viewers each.

This is starting to look bad. What’s going to happen at peak viewership?

If only there was a way to distribute the traffic and use all the nodes more effectively, that would definitely help.

That’s the case for load balancing!

So what is load balancing? #

Load balancing is essentially distributing tasks (like requests in our example above) evenly across multiple resources (nodes in our case). What “even” means is dependent on the particular load balancing approach.

A load balancer is the specific apparatus that stands between incoming tasks and resources that do the work. This is generally some real (or virtual) entity that may use software or hardware techniques to evenly assign tasks to resources.

Where does the load balancer sit? #

Where’s a good place to inject a load balancer into the system? The only strict requirement is that it’s somewhere between receiving a request and processing that request.

Commonly, though, it’s at one of these two levels, based on the OSI model:

  • Layer 4: routing decisions are made at the “Transport” layer using basic network information like IP addresses/ports. This tends to be fairly fast and efficient.
  • Layer 7: here, routing happens at the “Application” layer. It’s slower than Layer 4 techniques, but it has access to richer information (like HTTP headers and URL paths) to make the routing decision.

Also worth noting, sometimes load balancing is done on nodes that are placed geographically (like in datacenters for different regions). This variant is most commonly used for Content Delivery Networks (CDN).

Ways to route traffic #

Round robin #

This is a fairly straightforward technique. Requests are distributed in circular sequential order across nodes. For instance, in a 3-node system, request 1 might go to Node B and request 2 to Node C, then request 3 would go to Node A, and so on.

graph LR LB((Load Balancer)) A[Node A] B[Node B] C[Node C] R1[Request 1] --> LB R2[Request 2] --> LB R3[Request 3] --> LB LB -->|1| B LB -->|2| C LB -->|3| A style LB fill:#f9f,stroke:#333,stroke-width:4px style A fill:#dfd,stroke:#333 style B fill:#dfd,stroke:#333 style C fill:#dfd,stroke:#333

The advantage of this technique is that there’s an intuitive sense of “fairness” going from node to node.

It’s mostly used for simplicity and when nodes are expected to behave alike one another. However, the downside is that if a few nodes have ended up in a state where they’re shouldering too many requests, then round robin will just keep assigning requests to them like normal. There’s no intelligence towards alleviating load on already burdened nodes.

Weighted round robin #

This tends to be mostly the same as plain round robin, with a small tweak. Each node gets a proportionally greater weight based on that node’s capacity, so that larger nodes can take a larger share of incoming requests.

That does improve things a bit compared to classic round robin, but the weights are fixed – there’s no way to auto-adjust them based on changes to traffic patterns.

graph LR LB((Load Balancer)) A[Node A
Weight: 3] B[Node B
Weight: 2] C[Node C
Weight: 1] R1[Request 1] --> LB R2[Request 2] --> LB R3[Request 3] --> LB R4[Request 4] --> LB R5[Request 5] --> LB R6[Request 6] --> LB LB -->|1,2,3| A LB -->|4,5| B LB -->|6| C style LB fill:#f9f,stroke:#333,stroke-width:4px style A fill:#dfd,stroke:#333 style B fill:#ffd,stroke:#333 style C fill:#ffe,stroke:#333

Least connections #

As the name suggests, this approach directs requests towards the node with the currently lowest number of connections.

This is great for being sensitive to changes in the burden each node is carrying – the most-burdened node will receive fewer new incoming requests.

graph LR LB((Load Balancer)) A[Node A
5 connections] B[Node B
1 connections] C[Node C
2 connection] R1[Request 1] --> LB R2[Request 2] --> LB R3[Request 3] --> LB LB -->|Req 1| B LB -->|Req 2| B LB -->|Req 3| C subgraph Current State A B C end style LB fill:#f9f,stroke:#333,stroke-width:4px style A fill:#fdd,stroke:#333 style B fill:#ffd,stroke:#333 style C fill:#dfd,stroke:#333

The big unknown here is connection length. If requests are all fairly short and/or all take about the same processing time, then this technique works fine.

Suppose there are long-running requests, especially among shorter requests. In that case, a few nodes may be occupied with these long requests and the algorithm will keep assigning new requests to other nodes. This will bias towards a certain subset of nodes, which reduces the “fairness” of the balancing.

IP hash #

This technique focuses on the incoming side instead of the processing side that the previous techniques focused on.

The client’s IP address is passed through a hash function and the output is used to select a node to handle the request. Note that this means that if the same IP makes multiple requests, those will all be routed to the same node.

Clearly, the big win here is the “stickiness” of which node handles requests. This unlocks things like reusable client-server state or session persistence, which can be helpful.

graph LR C1[Req 1
IP W] --> LB C2[Req 2
IP Y ] --> LB C3[Req 3
IP W] --> LB C4[Req 4
IP Z] --> LB LB((Load Balancer)) LB -->|"Req 1
hash(IP W) % 3 = 1"| B LB -->|"Req 2
hash(IP Y) % 3 = 2"| C LB -->|"Req 3
hash(IP W) % 3 = 1"| B LB -->|"Req 4
hash(IP Z) % 3 = 0"| A A[Node A] B[Node B] C[Node C] style LB fill:#f9f,stroke:#333,stroke-width:4px style A fill:#dfd,stroke:#333 style B fill:#dfd,stroke:#333 style C fill:#dfd,stroke:#333

Other techniques #

There are several other approaches that use more “intelligence”. For example, they may look at past request durations by node, or use more complex analysis of traffic patterns to dynamically redistribute load.