Scalability — vertical vs horizontal, stateless services
Scaling a system means serving more load without falling over. The choice — bigger machines or more machines — defines the rest of the architecture.
Scaling = handling more load (requests, users, data) without the system becoming slow or falling over. Two fundamental strategies: (get a bigger machine) and (add more machines).
Vertical scaling = bigger box. More CPU, more RAM, bigger disks. Simple (no code changes), limited (the biggest machine is finite and expensive), single point of failure (that one beefy box goes down, you're out).
Horizontal scaling = more boxes. Load is spread across N machines; adding another = linear capacity gain. Works up to Google-scale. Requires the service to be (or state to live in a shared place like a DB/cache) — otherwise request 2 doesn't find what request 1 left in memory.
Stateless service is the key unlock. Any instance can handle any request. If the service keeps session state in process memory, you MUST pin users to the same instance (sticky sessions) — a constraint that bites at scale. Better: stateless service + session in Redis/DB.
The bottleneck always moves. You scale the API tier → the database becomes the bottleneck. Scale the DB → the cache becomes hot. Scale the cache → network I/O. System design = hunting and removing the current bottleneck.
Scaling is a cost conversation. Horizontal is usually cheaper per-unit-of-capacity but requires engineering work. A typical path: scale vertically for the first 10x, start horizontal once code + infra are ready for it.
Grounded on https://martinfowler.com/articles/patterns-of-distributed-systems/
Next up
Load Balancing — L4 vs L7, algorithms, health checks
The traffic cop in front of your service pool. Decides which backend handles each request, detects dead ones, and keeps things flowing.