Dualo
SQL Essentials

Indexes & query performance

An index is a secondary data structure that turns O(N) table scans into O(log N) lookups. Know when to add one — and when NOT to.

2 min read

**Index types**: **B-tree** (default; equality + range + order), **Hash** (equality only; faster than B-tree for pure lookup but no range/sort; Postgres has them since v10 WAL-logged), **GiST** / **GIN** (Postgres; full-text, geo, JSON, arrays), **BRIN** (Block-Range; summary per block — tiny index for naturally-sorted columns like time-series), **Bitmap** (SQL Server / Oracle; low-cardinality columns like gender/status), **Columnstore** (SQL Server / analytical workloads).

B-tree structure: balanced tree of sorted key pointers. Height ~ log(N) with typical fanout 100-400 — a 10M row index has depth 3-4. Insert/delete rebalances (cheap amortized).

Covering index (index-only scan): if the index contains all columns the query needs, the engine never reads the table. Postgres supports INCLUDE clause: CREATE INDEX ON orders(customer_id) INCLUDE (amount, created_at). SQL Server: same syntax. MySQL: include columns are just trailing columns of the key.

**Leftmost-prefix rule** for composite indexes: an index on `(a, b, c)` supports `WHERE a = ?`, `WHERE a = ? AND b = ?`, `WHERE a = ? AND b = ? AND c = ?`. It does NOT efficiently support `WHERE b = ?` alone — leftmost column must be equality-bound.

**EXPLAIN / EXPLAIN ANALYZE**: EXPLAIN shows the plan (estimated costs). EXPLAIN ANALYZE runs the query and shows ACTUAL row counts + timing per operator. Key signals: large gap between estimated and actual rows (bad stats → run ANALYZE / UPDATE STATISTICS), unexpected Seq Scan (missing index or non- predicate), expensive nested loop on large inner (hash join would be better — check memory config).

**Index maintenance**: bloat in Postgres (dead tuples from UPDATE/DELETE) → use `REINDEX CONCURRENTLY` periodically or pg_repack. Auto-vacuum settings matter. SQL Server: fragmentation → ALTER INDEX ... REORGANIZE / REBUILD. Index usage stats: `pg_stat_user_indexes` in Postgres, `sys.dm_db_index_usage_stats` in SQL Server — drop unused indexes.

Write overhead: every DML touches every index on the table. An INSERT with 10 indexes = 1 table write + 10 index writes. On hot-write tables, fewer indexes can be faster than more. Batching writes, deferring index updates (bulk load pattern) helps.

**Index selectivity**: fraction of rows matching. Low-selectivity indexes (boolean, gender) rarely help — the engine prefers a scan. Partial indexes help: `CREATE INDEX ON users(created_at) WHERE is_active = true` — smaller, faster, covers only the common filter.

Grounded on https://use-the-index-luke.com/

Next up

Transactions & isolation levels

ACID, what it guarantees, and how isolation levels (Read Committed, Repeatable Read, Serializable) trade consistency for concurrency.