Dualo
System Design Essentials

Consistency Models — CAP, PACELC, strong vs eventual

Distributed systems must pick their consistency guarantees. CAP is the headline; the real design lives in eventual, causal, and read-your-writes nuances.

2 min read

CAP theorem formally: under Partition (network split between replicas), a system can either (a) preserve Consistency by rejecting some operations (unavailable nodes return errors) → CP system, or (b) preserve Availability by serving requests on all nodes and accepting divergence → AP system. You cannot have both under partition. CA is trivial (single-node, no partition possible).

PACELC (Abadi): a more useful framing. Partition → A or C. Else (no partition) → Latency or C. Example: Dynamo is PA/EL (partition-available, normally optimize for latency). Spanner is PC/EC (partition-consistent, normally consistent at latency cost). MongoDB (majority writes) is PC/EC.

**Strong consistency** implementations: (i) **linearizability** — reads always see the most recent write, as if all ops serialize on a global clock; requires consensus (Paxos / Raft) for writes; (ii) **sequential consistency** — all ops appear in SOME total order (same for every client) but not necessarily real-time order; (iii) **serializability** (DB term) — transactions equivalent to some serial execution.

Eventual consistency variants: causal consistency (Lamport clocks / vector clocks — writes with causal dependency arrive in order; concurrent writes may conflict); read-your-writes (client sees its own writes immediately); monotonic reads (same client never sees state move backward); monotonic writes (writes from same client apply in order); writes-follow-reads (a write applies after all reads that preceded it from the same client's perspective).

Consistency is per-operation, not per-system. A mature system lets the client choose: Cassandra's CONSISTENCY LEVEL ONE / QUORUM / ALL, MongoDB's readConcern: local / majority / linearizable, DynamoDB's ConsistentRead: true/false. Per-operation choice = precise cost/latency tradeoff.

**** (Dynamo-style): `W + R > N` guarantees read sees latest write (N replicas, W for write quorum, R for read quorum). Common configs: N=3, W=2, R=2 (strong); N=3, W=3, R=1 (write-heavy latency cost, fast reads); N=3, W=1, R=1 (AP extreme, may miss writes).

Consensus algorithms: Paxos (original, notoriously hard to implement correctly), Raft (modern, easier to understand and implement — used by etcd, Consul, CockroachDB), Zab (ZooKeeper's consensus). All solve distributed agreement in presence of failures; all require majority (f+1 of 2f+1 nodes alive).

Session guarantees for client-friendly consistency: (i) client tracks a version/LSN after each write; (ii) reads specify 'must be ≥ this LSN'; (iii) replicas that haven't caught up refuse (fall back to leader). Implementation in Postgres via pg_last_wal_replay_lsn() + wait_for_lsn.

Grounded on https://www.mongodb.com/docs/manual/reference/read-concern/

Next up

Observability — logs, metrics, traces, SLOs

The three pillars (logs, metrics, traces) tell you WHAT broke. SLIs/SLOs tell you if it MATTERS. You need both.