Scalability — DevOps Engineer Track

Overview

Scalability is a system's ability to handle increasing load without degrading performance. It matters because traffic grows and a design that works for 100 users may collapse at 100,000. The two main strategies are scaling up (bigger machines) and scaling out (more machines).

Syntax / Usage

Vertical scaling adds more CPU, RAM, or disk to a single server. It's simple but has a hard ceiling and a single point of failure. Horizontal scaling adds more servers behind a load balancer, which is harder but far more scalable.

Vertical:   [ small server ]  -->  [ BIG server ]

Horizontal: [ server ]        -->  [ server ]
                                   [ server ]
                                   [ server ]  (behind a load balancer)

Horizontal scaling usually requires stateless services so any instance can serve any request.

Examples

A startup's database is slow, so they upgrade the instance from 4 GB to 32 GB of RAM. That is vertical scaling and buys time quickly.

A photo-sharing site adds five more web servers behind a load balancer during a traffic spike, then removes them afterward. That is horizontal scaling, often automated as auto-scaling.

Common Mistakes

Assuming vertical scaling alone will last forever
Keeping in-memory state on servers, which blocks horizontal scaling
Scaling app servers but leaving a single database as the bottleneck
Optimizing prematurely before measuring the real bottleneck
Forgetting that more servers add coordination and consistency complexity

Overview

Syntax / Usage

Examples

Common Mistakes

See Also