stackademic

The leading education platform for anyone with an interest in software development.

Load Balancing

Distributing incoming traffic across multiple servers for reliability and scale

Overview

A load balancer distributes incoming requests across a pool of servers so no single server is overwhelmed. It matters because it enables horizontal scaling and improves availability by routing around failed servers. It's a foundational piece of almost any high-traffic system.

Syntax / Usage

Clients send requests to the load balancer's address, and it forwards each request to a healthy backend server based on a routing algorithm. Health checks let it stop sending traffic to servers that are down.

                 --> [ Server A ]
[ Clients ] --> [ LB ] --> [ Server B ]
                 --> [ Server C ]

Common algorithms: round robin (rotate evenly), least connections (send to the least busy), and IP hash (stick a client to one server).

Examples

An nginx upstream block spreads traffic across three app servers and skips any that fail health checks:

upstream app {
  server 10.0.0.1;
  server 10.0.0.2;
  server 10.0.0.3;
}

A cloud provider's load balancer runs health checks every 10 seconds and removes an unhealthy instance from rotation, so users never hit the broken server.

Common Mistakes

  • Making the load balancer itself a single point of failure (use redundancy)
  • Requiring sticky sessions because servers hold local state
  • Skipping health checks, so traffic still flows to dead servers
  • Assuming even distribution when requests vary greatly in cost
  • Ignoring the load balancer as a capacity bottleneck of its own

See Also

system-design-scalability system-design-client-server system-design-fundamentals