Scalability
How systems handle growing load through vertical and horizontal scaling
Overview
Scalability is a system's ability to handle increasing load without degrading performance. It matters because traffic grows and a design that works for 100 users may collapse at 100,000. The two main strategies are scaling up (bigger machines) and scaling out (more machines).
Syntax / Usage
Vertical scaling adds more CPU, RAM, or disk to a single server. It's simple but has a hard ceiling and a single point of failure. Horizontal scaling adds more servers behind a load balancer, which is harder but far more scalable.
Vertical: [ small server ] --> [ BIG server ]
Horizontal: [ server ] --> [ server ]
[ server ]
[ server ] (behind a load balancer)
Horizontal scaling usually requires stateless services so any instance can serve any request.
Examples
A startup's database is slow, so they upgrade the instance from 4 GB to 32 GB of RAM. That is vertical scaling and buys time quickly.
A photo-sharing site adds five more web servers behind a load balancer during a traffic spike, then removes them afterward. That is horizontal scaling, often automated as auto-scaling.
Common Mistakes
- Assuming vertical scaling alone will last forever
- Keeping in-memory state on servers, which blocks horizontal scaling
- Scaling app servers but leaving a single database as the bottleneck
- Optimizing prematurely before measuring the real bottleneck
- Forgetting that more servers add coordination and consistency complexity
See Also
system-design-load-balancing system-design-caching system-design-fundamentals