Load Balancing — L4 vs L7, algorithms, health checks
The traffic cop in front of your service pool. Decides which backend handles each request, detects dead ones, and keeps things flowing.
A load balancer sits in front of your backend pool. Every request hits the LB first; the LB picks a backend and forwards. If a backend dies, the LB notices (via health checks) and stops sending traffic there.
Two main types: L4 (layer 4, TCP/UDP — fast, dumb, no inspection of the content) and L7 (layer 7, HTTP — can route by URL path, header, cookie, or content).
Algorithms to pick a backend: round-robin (rotate 1, 2, 3, 1, 2, 3 — simplest), least connections (send to the backend with the fewest active connections), IP hash / consistent hashing (same client → same backend — for stickiness), weighted (give more traffic to beefier machines).
Health checks: the LB periodically pings each backend (e.g., GET /health every 10s). Failed backend = pulled from the pool. Add /health endpoint to your service; it should check DB connectivity + critical dependencies — not just 'return 200'.
Sticky sessions pin a client to one backend (via cookie or client IP). Useful if you have session state in memory. Bad for even distribution and zero-downtime deploys. Prefer stateless service + externalized session (Redis).
Common load balancers: Nginx (L7, most popular), HAProxy (L4 + L7, battle-tested), AWS ALB (L7 managed), AWS NLB (L4 managed, high throughput), Envoy (modern, service mesh native), Cloudflare Load Balancer (global).
Diagram
Grounded on https://www.nginx.com/resources/glossary/load-balancing/
Next up
Caching — layers, strategies, invalidation
Speed up reads by storing recent/popular results closer to the caller. Done right, it turns a 100ms DB query into a 1ms cache hit.