GROUP BY & aggregates

Roll up rows into buckets and summarize: COUNT, SUM, AVG, MIN, MAX. The foundation of every report.

1 min read

GROUP BY bundles rows that share a value, then aggregate functions compute one result per bundle. SELECT country, COUNT(*) FROM customers GROUP BY country — one row per country, with the count.

**Core aggregates**: `COUNT(*)` (all rows including s), `COUNT(col)` (non-NULL values), `COUNT(DISTINCT col)` (unique values), `SUM`, `AVG`, `MIN`, `MAX`. Statistical extras in Postgres: `STDDEV`, `VARIANCE`, `PERCENTILE_CONT`, `ARRAY_AGG`, `STRING_AGG`.

Rule: every column in SELECT that is NOT inside an aggregate function must appear in GROUP BY. Otherwise the engine doesn't know which of the many values in the bucket to return. Some engines (MySQL historically) relaxed this — got bitten by random-looking results.

WHERE vs HAVING: WHERE filters rows BEFORE aggregation. HAVING filters groups AFTER aggregation. WHERE country = 'FR' keeps French customers; HAVING COUNT(*) > 100 keeps countries with more than 100 customers. Mixing them up = wrong query.

Concrete example: 'top 5 countries by customer count' → SELECT country, COUNT(*) AS n FROM customers GROUP BY country ORDER BY n DESC LIMIT 5. Not rocket science once the pattern clicks.

GROUP BY multiple columns: GROUP BY country, year — one bucket per (country, year) pair. Useful for pivoting data across dimensions.

Grounded on https://www.postgresql.org/docs/current/tutorial-agg.html

Next up

Subqueries & CTEs

Nest one query inside another, or pull it out as a named CTE (WITH clause) for readability. The tool that turns 200-line queries into 20-line ones.