Replication & Sharding

Replication = multiple copies of the same data for availability/read scaling. Sharding = splitting data across nodes for write scaling. Both are essential at scale.

Easy Technical

1 min read

**** = multiple identical copies of the same data on different nodes. Gives you (a) **availability** — if the primary dies, a replica takes over, (b) **read scale** — spread read traffic across replicas.

**** = split the data across nodes — each node owns a slice. Gives you **** — 10 shards can handle ~10× more writes than 1. Each shard is itself replicated for availability.

**Leader-follower replication** (most common): one primary handles all writes, replicas copy the primary's log asynchronously. Readers can hit replicas for load distribution. Failover = promote a replica to primary when primary dies. Examples: Postgres streaming replication, MySQL replication, MongoDB replica sets.

**Gotcha with async replication**: replica lag. User writes to primary, immediately reads from replica — might not see their own write yet (replica is a few hundred ms behind). Fix: **read-your-writes** — route that user's reads to primary for a short window; or **synchronous replication** (primary waits for replica to confirm before ack, slower writes).

Sharding keys: how do you decide which shard owns which data? Range sharding (0-1k on shard A, 1k-2k on shard B — simple, risk of hot shards if data isn't uniform). Hash sharding (hash(id) % N — uniform distribution, harder to do range queries). Directory-based (a lookup table — flexible, adds a dependency).

**Sharding is painful**. Cross-shard joins/transactions are hard. Resharding (adding nodes) requires data movement. Consider it the **last** scaling resort, after exhausting vertical + . NewSQL (Spanner, CockroachDB) hides sharding behind SQL — at an operational cost.

Diagram

Primary (RW) shard 1: keys 0-1000

Replica 1 (read-only)

Replica 2 (read-only)

Primary (RW) shard 2: keys 1001-2000

Replica 3 (read-only)

Grounded on https://martin.kleppmann.com/

Next up

Message Queues & async processing

Decouple producers from consumers with a queue in between. Smooths traffic spikes, enables async work, and isolates failures.