Database Sharding
Splitting a dataset across multiple databases to scale writes and storage
Overview
Sharding partitions a single logical dataset across multiple independent databases (shards), each holding a subset of the rows. It scales writes and storage beyond what one machine can handle, unlike read replicas which only scale reads. The cost is significant complexity: cross-shard queries, transactions, and rebalancing all become harder, so sharding should be a last resort after vertical scaling, caching, and replication.
Syntax / Usage
A shard key determines which shard a row lives on. A routing layer maps the key to a shard, usually via hashing or ranges. Consistent hashing minimizes how much data must move when shards are added or removed.
shard_for(key, num_shards):
return hash(key) % num_shards # hash-based: even spread, no range scans
Ranges: users A-H -> shard1, I-P -> shard2, Q-Z -> shard3
Directory: lookup table maps key -> shard (flexible, extra hop)
[ App ] --> [ Router ] --> shard1 | shard2 | shard3
Choose a shard key with high cardinality and even access so no single shard becomes a hotspot. Modulo hashing forces a full reshuffle when num_shards changes, which is why consistent hashing or virtual buckets are preferred.
Examples
A social app shards the posts table by user_id, so all of a user's posts live together and a profile page reads from one shard.
A multi-tenant SaaS uses a directory shard map keyed by tenant_id, letting a huge tenant be moved to a dedicated shard without changing the hashing scheme.
A time-series system range-shards by month, keeping recent hot data on fast storage and archiving old shards cheaply—accepting that "last 90 days" queries may fan out across a few shards.
Common Mistakes
- Picking a low-cardinality or skewed shard key, creating hotspots (e.g. sharding by country)
- Using
hash % N, forcing a massive data migration whenever shard count changes - Assuming cross-shard joins and transactions work like single-node ones (they need scatter-gather or sagas)
- Sharding prematurely instead of first trying replicas, caching, and bigger hardware
- Forgetting that unique constraints and auto-increment IDs no longer hold across shards
See Also
system-design-databases system-design-scalability system-design-cap-theorem