Dualo
System Design Essentials

SQL vs NoSQL — when each shines

Relational (ACID, joins, strong schema) vs document/key-value/wide-column/graph (schema flexibility, horizontal scale by default). Pick by use case, not hype.

2 min read

**Relational model properties**: normalized schemas (1NF-BCNF), declared foreign keys with referential integrity, transactional semantics across rows (ACID), general-purpose query language (SQL). Trade: schema evolution is heavier (migrations), vertical-scale dominant ( possible but operator-heavy).

Document DBs (MongoDB, Firestore, CouchDB, DynamoDB-in-document-mode): stores JSON/BSON. Schema-flexible per document (field can exist in some docs, not others). Typical access: query by doc ID or indexed fields, nested field access. Strengths: JSON payload naturally fits HTTP/JSON APIs, flexible evolution. Weaknesses: joins require app-side logic or $lookup (Mongo's awkward join), harder to enforce cross-document invariants.

Key-value (Redis, DynamoDB, Memcached): simplest model — get/put by key. Optimized for latency (< 1ms) + massive throughput. Use as: cache, session store, distributed locks, queues, counters. Redis adds data structures beyond plain KV.

**Wide-column** (Cassandra, HBase, Bigtable, ScyllaDB): each row can have many sparse columns organized into column families. Designed for huge write throughput + linearly-scalable reads by partition key. Model driven by access patterns (denormalize aggressively). Strengths: time-series, IoT, messaging history. Weaknesses: ad-hoc queries are impossible, model changes require rewriting data.

Graph DBs (Neo4j, Neptune, ArangoDB): nodes + edges with properties. Query language Cypher / Gremlin / SPARQL. Strength: traversals ('friends of friends who like X, within 3 hops'), relationship analysis, fraud rings, recommendation. Weakness: not the right hammer for everything.

**NewSQL** (CockroachDB, YugabyteDB, Spanner, TiDB): SQL interface + horizontal scale + strong consistency. Distributed Raft/Paxos-backed. More complex operationally than Postgres, but usable for globally-distributed apps needing SQL semantics. Consider when Postgres horizontal scaling becomes painful.

** applied**: all distributed DBs sit on a spectrum. CP (consistency + partition-tolerance, trade availability): Spanner, MongoDB with majority writes, HBase. AP (availability + partition-tolerance, trade strong consistency): Cassandra, DynamoDB with eventual consistency. CA doesn't exist in a real distributed system under partitions.

Decision heuristic: (i) need ACID across multiple entities? SQL. (ii) need flexible schema AND simple access-by-ID patterns? Document. (iii) need massive write throughput + known access patterns? Wide-column. (iv) relationship queries central? Graph. (v) cache / session / counter? KV. Mix them — polyglot persistence is the norm.

Grounded on https://martin.kleppmann.com/

Next up

Replication & Sharding

Replication = multiple copies of the same data for availability/read scaling. Sharding = splitting data across nodes for write scaling. Both are essential at scale.