Dualo
Backend Architectures Deep Dive

Concurrency models — threads, events, goroutines

How a backend handles many simultaneous requests. Thread-per-request, event loop, goroutines, worker pools — each has radically different scaling ceilings.

1 min read

Every backend answers one core question: 'while I'm waiting for the database, can I handle another request?'. The answer defines its concurrency model — and its scaling ceiling.

Thread-per-request (Django sync, Rails Puma, Spring MVC): each request gets its own OS thread. Simple to reason about — one thread, one request, one stack trace. Expensive: each thread eats ~1 MB of memory, and context-switching past a few thousand concurrent requests dominates CPU.

Event loop (Node.js, FastAPI + Uvicorn, Python asyncio, Tornado): one thread juggles thousands of requests by switching between them whenever one is waiting. Scales to 10k–100k concurrent I/O-bound requests per process. But any blocking call (sync DB, CPU work) stalls EVERY pending request.

Goroutines / green threads (Go, Elixir/BEAM, Java Virtual Threads since JDK 21): millions of lightweight user-space 'threads' scheduled by the runtime onto a small pool of OS threads. You write blocking-style code, the runtime makes it non-blocking transparently. Best of both worlds.

Pre-fork worker pool (Gunicorn, Puma, PHP-FPM, Passenger): N worker PROCESSES, each handling requests. Each worker can internally use threads or an event loop. Isolates crashes (one worker dies, others survive). The default deployment shape for Python and Ruby.

Grounded on https://en.wikipedia.org/wiki/C10k_problem

Next up

Sync vs async — when each actually wins

Async is not universally faster. It changes the scaling profile: better under high I/O concurrency, worse for single-request latency and CPU-bound work.