Dualo
SQL Essentials

GROUP BY & aggregates

Roll up rows into buckets and summarize: COUNT, SUM, AVG, MIN, MAX. The foundation of every report.

1 min read

**GROUP BY execution**: usually (build a hash map of group key → running aggregate state) or **streaming aggregation** (if input is already sorted by the group key — e.g. after a merge join or in order). Hash aggregation requires memory for the distinct-groups state; large cardinality groups can spill to disk.

**Aggregate + semantics**: aggregates SKIP NULL inputs (except `COUNT(*)`). `AVG(salary)` over 10 rows with 2 NULLs averages 8 values. If you wanted NULL-as-zero, coalesce first: `AVG(COALESCE(salary, 0))`.

COUNT variants performance: COUNT(*) is typically fastest — engine counts rows without touching columns. COUNT(col) must read the column to skip NULLs. COUNT(DISTINCT col) requires a sort or hash to dedup.

GROUP BY 1, 2, ...: positional GROUP BY refers to the N-th column in SELECT. Portable (SQL standard), but breaks when SELECT changes. Fine in ad-hoc; name columns in committed code.

GROUP BY + ORDER BY relationship: ORDER BY operates on the grouped result, and can reference aggregates or aliases. ORDER BY COUNT(*) DESC is valid. ORDER BY runs last in the logical order.

Rollups and cubes (advanced grouping sets): GROUP BY ROLLUP(country, region) produces (country, region), (country, NULL), (NULL, NULL) — subtotals + grand total. CUBE produces all combinations. GROUPING SETS is the general form. Power tool for reporting.

FILTER clause (standard, supported in Postgres / SQL Server 2022+ / DuckDB): SUM(amount) FILTER (WHERE status = 'paid') — aggregate only matching rows within the group. Cleaner than the classic SUM(CASE WHEN status = 'paid' THEN amount ELSE 0 END) pattern.

**Performance note**: filter WITH WHERE before grouping whenever possible — the optimizer often does this automatically but explicit predicates on the grain-defining columns help ensure .

Grounded on https://www.postgresql.org/docs/current/tutorial-agg.html

Next up

Subqueries & CTEs

Nest one query inside another, or pull it out as a named CTE (WITH clause) for readability. The tool that turns 200-line queries into 20-line ones.