Dualo
System Design Essentials

Load Balancing — L4 vs L7, algorithms, health checks

The traffic cop in front of your service pool. Decides which backend handles each request, detects dead ones, and keeps things flowing.

1 min read

A load balancer sits in front of your backend pool. Every request hits the LB first; the LB picks a backend and forwards. If a backend dies, the LB notices (via health checks) and stops sending traffic there.

Two main types: L4 (layer 4, TCP/UDP — fast, dumb, no inspection of the content) and L7 (layer 7, HTTP — can route by URL path, header, cookie, or content).

Algorithms to pick a backend: round-robin (rotate 1, 2, 3, 1, 2, 3 — simplest), least connections (send to the backend with the fewest active connections), IP hash / consistent hashing (same client → same backend — for stickiness), weighted (give more traffic to beefier machines).

Health checks: the LB periodically pings each backend (e.g., GET /health every 10s). Failed backend = pulled from the pool. Add /health endpoint to your service; it should check DB connectivity + critical dependencies — not just 'return 200'.

Sticky sessions pin a client to one backend (via cookie or client IP). Useful if you have session state in memory. Bad for even distribution and zero-downtime deploys. Prefer stateless service + externalized session (Redis).

Common load balancers: Nginx (L7, most popular), HAProxy (L4 + L7, battle-tested), AWS ALB (L7 managed), AWS NLB (L4 managed, high throughput), Envoy (modern, service mesh native), Cloudflare Load Balancer (global).

Diagram

Grounded on https://www.nginx.com/resources/glossary/load-balancing/

Next up

Caching — layers, strategies, invalidation

Speed up reads by storing recent/popular results closer to the caller. Done right, it turns a 100ms DB query into a 1ms cache hit.