Stateless services & horizontal scaling

To scale horizontally, any request must be servable by any instance. In-memory state breaks this — sessions, caches, and uploads must live outside the process.

Easy Technical

2 min read

Session storage — know your framework's default: Django: database-backed by default (already shared across instances, works horizontally out of the box). Rails: cookie-based signed/encrypted sessions by default (literally stateless — session IS the cookie, no server storage). Express + express-session: in-memory by default — WILL break at 2+ instances; swap to connect-redis or connect-pg-simple before scaling. FastAPI + Starlette SessionMiddleware: cookie-based by default, JSON-serialized + signed. Always read the default before you scale; the defaults vary wildly.

JWT trade-offs: truly stateless — server only verifies signature, no DB lookup. Scales trivially. Costs: (a) revocation is hard — a leaked token is valid until expiry; use short TTL + refresh tokens + a revocation list for critical apps. (b) bloat: large JWTs repeated on every request add bytes. (c) payload is readable — base64-encoded, anyone can decode; only the signature is secret. Don't store secrets inside. For first-party web apps, signed cookies are often simpler and safer than JWT (CSRF + XSS attack surfaces are better-understood).

Cache coherence at scale: in-process LRU caches are blazing fast but per-instance (N copies of the same cache, no coherence on invalidation). Redis / Memcached = shared cache, TTL + eviction policies, network round-trip per access. Two-tier pattern (L1 in-process small cache + L2 Redis) is common at scale with a pub/sub channel for invalidation. Watch for stampede on cache miss — cache-aside with a lock or SETNX coordination.

Long-lived connections (WebSockets, SSE): connection pins a client to one instance for its lifetime. Load balancer must support sticky or connection-hash routing. Scaling pub/sub across instances: when instance A receives a 'broadcast' command, instance B's connected clients must also get the message. Solutions: Redis pub/sub, Phoenix Channels with Redis adapter, Socket.IO with Redis adapter, external bus (Kafka, Pulsar, Ably). Without this, your WebSocket app is single-instance by definition.

Background jobs are state you forgot: if send_welcome_email() runs inside the HTTP request, it ties up a worker, fails half-way on deploys, and retries are manual. Move to a job queue: Celery/Sidekiq/BullMQ/RQ on top of Redis or RabbitMQ. The request enqueues; dedicated worker processes consume. Decoupling request-time from work-time is a prerequisite for any reliability guarantee.

File uploads: saving to local disk (/tmp/upload.csv) works until you add a second instance or a deploy happens. Pattern: (a) client uploads directly to S3/GCS via presigned URL (server never handles the bytes), (b) server stores only the URL/reference in the DB. Lambda/Vercel go further: filesystem is ephemeral per-invocation — anything written to disk disappears when the function exits.

The shared-nothing ideal: each instance knows only the database + caches + queues it talks to over the network. In-process state is either per-request scratch (fine) or a locally-cached copy of externally-authoritative data (invalidated via TTL or pub/sub). Achieving this discipline pays off forever: deploys become rolling, horizontal scaling is automatic, outages of one instance are invisible to users.

Grounded on https://12factor.net/processes

Next up

Deployment models — server, serverless, edge

Long-running process, Function-as-a-Service, edge runtime. Each imposes different constraints on your framework, your DB connections, and your mental model.