Dualo
SQL Essentials

Window functions — ranking, running totals, LAG/LEAD

Aggregate without collapsing rows. Ranks, running totals, period-over-period comparisons — the superpower of modern SQL.

2 min read

Window = partition × order × frame. PARTITION BY groups rows (each group is a window independent from others). ORDER BY inside OVER establishes ordering within the window. The FRAME specifies which rows (relative to current) contribute to the aggregate: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW (running total), ROWS BETWEEN 6 PRECEDING AND CURRENT ROW (7-day moving window), RANGE BETWEEN interval '30 days' PRECEDING AND CURRENT ROW (30-day by value, not by count).

ROW_NUMBER vs RANK vs DENSE_RANK: for ORDER BY score DESC on [100, 100, 90]: ROW_NUMBER = 1, 2, 3. RANK = 1, 1, 3 (tied ranks, gap). DENSE_RANK = 1, 1, 2 (tied ranks, no gap). Pick based on whether you want uniqueness vs tie-allowance.

LAG / LEAD default NULL if out of bounds; LAG(col, n, default) substitutes. Useful to avoid NULL-check wrappers: LAG(price, 1, 0) treats the row before the first as 0.

**Default frame behaviour** depends on presence of ORDER BY inside OVER: without ORDER BY = full partition (classic aggregate OVER); with ORDER BY = RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW (running total semantics). This default is a frequent footgun — always explicit when you care.

ROWS vs RANGE vs GROUPS framing: ROWS counts physical rows. RANGE follows value ranges (same ORDER-BY value = same frame row). GROUPS (SQL:2011, Postgres 11+) counts groups of tied ORDER-BY values. Most use cases want ROWS for clarity.

**Top-N per group canonical pattern**: `SELECT * FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) AS rn FROM orders) t WHERE rn = 1`. Alternative in Postgres: `DISTINCT ON (customer_id) * FROM orders ORDER BY customer_id, order_date DESC`. also works.

Performance: window functions scan the input, sort by PARTITION + ORDER (if needed), then iterate in order maintaining running state. Sorting is the expensive step — an index on (partition, order) columns lets the engine skip the sort (Postgres 'window aggregate' plan with 'Index Scan').

**Gotchas**: (i) window functions can't appear in WHERE (because WHERE runs BEFORE SELECT in logical order) — nest the query or use a ; (ii) mixing window + GROUP BY in the same SELECT is valid (window runs AFTER group), but confusing — split with a CTE; (iii) forgetting ORDER BY in running-aggregate context gives arbitrary order per call.

Grounded on https://www.postgresql.org/docs/current/tutorial-window.html

Next up

NULL handling — three-valued logic and gotchas

NULL means 'unknown' in SQL — not 'empty' and not 'zero'. Three-valued logic (TRUE, FALSE, UNKNOWN) quietly breaks naive queries.