Load Balancing
Distributing incoming traffic across multiple servers for reliability and scale
Overview
A load balancer distributes incoming requests across a pool of servers so no single server is overwhelmed. It matters because it enables horizontal scaling and improves availability by routing around failed servers. It's a foundational piece of almost any high-traffic system.
Syntax / Usage
Clients send requests to the load balancer's address, and it forwards each request to a healthy backend server based on a routing algorithm. Health checks let it stop sending traffic to servers that are down.
--> [ Server A ]
[ Clients ] --> [ LB ] --> [ Server B ]
--> [ Server C ]
Common algorithms: round robin (rotate evenly), least connections (send to the least busy), and IP hash (stick a client to one server).
Examples
An nginx upstream block spreads traffic across three app servers and skips any that fail health checks:
upstream app {
server 10.0.0.1;
server 10.0.0.2;
server 10.0.0.3;
}
A cloud provider's load balancer runs health checks every 10 seconds and removes an unhealthy instance from rotation, so users never hit the broken server.
Common Mistakes
- Making the load balancer itself a single point of failure (use redundancy)
- Requiring sticky sessions because servers hold local state
- Skipping health checks, so traffic still flows to dead servers
- Assuming even distribution when requests vary greatly in cost
- Ignoring the load balancer as a capacity bottleneck of its own
See Also
system-design-scalability system-design-client-server system-design-fundamentals