stackademic

The leading education platform for anyone with an interest in software development.

Scalability

How systems handle growing load through vertical and horizontal scaling

Overview

Scalability is a system's ability to handle increasing load without degrading performance. It matters because traffic grows and a design that works for 100 users may collapse at 100,000. The two main strategies are scaling up (bigger machines) and scaling out (more machines).

Syntax / Usage

Vertical scaling adds more CPU, RAM, or disk to a single server. It's simple but has a hard ceiling and a single point of failure. Horizontal scaling adds more servers behind a load balancer, which is harder but far more scalable.

Vertical:   [ small server ]  -->  [ BIG server ]

Horizontal: [ server ]        -->  [ server ]
                                   [ server ]
                                   [ server ]  (behind a load balancer)

Horizontal scaling usually requires stateless services so any instance can serve any request.

Examples

A startup's database is slow, so they upgrade the instance from 4 GB to 32 GB of RAM. That is vertical scaling and buys time quickly.

A photo-sharing site adds five more web servers behind a load balancer during a traffic spike, then removes them afterward. That is horizontal scaling, often automated as auto-scaling.

Common Mistakes

  • Assuming vertical scaling alone will last forever
  • Keeping in-memory state on servers, which blocks horizontal scaling
  • Scaling app servers but leaving a single database as the bottleneck
  • Optimizing prematurely before measuring the real bottleneck
  • Forgetting that more servers add coordination and consistency complexity

See Also

system-design-load-balancing system-design-caching system-design-fundamentals