Indexes & query performance

An index is a secondary data structure that turns O(N) table scans into O(log N) lookups. Know when to add one — and when NOT to.

Easy Technical

1 min read

An **** is a sorted data structure on one or more columns. Without it, finding 'customer id = 42' in a 10M row table = scanning all 10M rows. With a B-tree index on id, it's ~24 comparisons (log2 10M). Orders of magnitude difference.

**** is the default index type — works for equality (`=`), range (`<`, `>`, `BETWEEN`), sort (`ORDER BY`), and prefix-match (`LIKE 'abc%'`). 99% of indexes you'll create are B-trees.

**** span multiple columns: `CREATE INDEX ON orders(customer_id, created_at)`. Useful for queries filtering on customer_id AND sorting by created_at — the index already has both. Column order matters: the index helps `WHERE customer_id = X` alone, but NOT `WHERE created_at > Y` alone.

**Indexes aren't free**. They take disk space (often 10-30% of the table size), slow down INSERT/UPDATE/DELETE (each write also updates the index), and if you're unlucky, the picks the wrong one. Don't over-index — add indexes based on slow-query evidence.

**EXPLAIN** (or EXPLAIN ANALYZE for real-run stats) shows the query plan — which indexes are used, which are ignored, estimated vs actual row counts. The #1 tool for diagnosing slow queries. Use it before guessing.

Index anti-patterns: (i) indexing every column 'just in case' — hurts writes + wastes RAM buffer; (ii) function on indexed column in WHERE — defeats index; (iii) VARCHAR(255) for join keys — bigger index, slower lookups. Index deliberately.

Grounded on https://use-the-index-luke.com/

Next up

Transactions & isolation levels

ACID, what it guarantees, and how isolation levels (Read Committed, Repeatable Read, Serializable) trade consistency for concurrency.