Dualo
SQL Essentials

SELECT basics — filter, sort, limit

The bread and butter: SELECT columns FROM table WHERE conditions ORDER BY sort LIMIT n. Learn this well; everything else builds on it.

1 min read

Logical order of evaluation: FROM + JOIN → WHERE → GROUP BY → HAVING → SELECT → DISTINCT → ORDER BY → LIMIT/OFFSET. This matters for aliases (you can't reference a SELECT alias in WHERE because WHERE runs first), and for understanding why moving a predicate between WHERE and HAVING matters.

**Predicates and SARGability**: a predicate is **** (Search-ARGument-able) if the engine can use an scan. `WHERE created_at >= '2026-01-01'` is sargable. `WHERE YEAR(created_at) = 2026` is NOT — the function on the column forces a full scan. Rewrite as `WHERE created_at >= '2026-01-01' AND created_at < '2027-01-01'`.

**Three-valued logic**: comparisons with return NULL (not TRUE, not FALSE). `WHERE x = NULL` never matches anything — use `WHERE x IS NULL`. `WHERE x != 'foo'` also excludes NULLs — use `WHERE x IS DISTINCT FROM 'foo'` in Postgres, or `WHERE (x != 'foo' OR x IS NULL)` portably.

**OFFSET + LIMIT pagination gotcha**: `LIMIT 10 OFFSET 100000` forces the engine to scan + discard 100k rows. For deep pagination, prefer ****: `WHERE id > :last_seen_id ORDER BY id LIMIT 10` — consistent performance regardless of position.

**SELECT * in production**: (i) breaks consumers when schema gains columns; (ii) ships unneeded columns over the wire — wasted bandwidth + memory; (iii) defeats (an index covers the needed columns only if you list them). Use `SELECT *` only in REPL / exploration.

DISTINCT performance: implemented as sort + dedup OR hash aggregation. On large tables, prefer GROUP BY col over DISTINCT col when you can — the planner treats them similarly but GROUP BY composes better when you add aggregates later. For 'is there any duplicate?' use EXISTS, not COUNT(*) > COUNT(DISTINCT ...).

**LIKE with wildcard**: `LIKE 'abc%'` can use a B-tree index. `LIKE '%abc'` cannot (trigram index helps — pg_trgm in Postgres). Plan your indexes around the query shape.

Grounded on https://www.postgresql.org/docs/current/queries-select-lists.html

Next up

Joins — INNER, LEFT, RIGHT, FULL

Combine rows across tables using keys. Pick the right join type or you'll silently lose or duplicate data.