Query optimization

You already know how to read EXPLAIN and when an index helps. This lesson is about the layer above that: the planner. It is cost-based — for every query it estimates how many rows each step will touch, prices the alternatives, and picks the cheapest. Feed it good estimates and it plans well; feed it bad ones and it picks a slow plan even when a perfect index exists.

Those estimates come entirely from statistics gathered by ANALYZE. The seed loaded 200,000 events (a login/click/purchase log) and 5,000 users, and ran ANALYZE for you, so the numbers below are trustworthy from the start.

Estimated vs. actual: reading the planner's mind

EXPLAIN ANALYZE runs the query and prints both what the planner expected and what actually happened. The gap between them is the single most useful diagnostic you have.

sql

EXPLAIN (ANALYZE, TIMING OFF, SUMMARY OFF)
SELECT * FROM events WHERE action = 'purchase';

Look at the Seq Scan line: rows=NNN in the cost section is the estimate; actual rows=NNN is the truth. Here they line up — both near 40,000 — because the stats are fresh. When estimates match reality, the planner's cost math is sound and you can trust the plan it chose.

Stale statistics pick bad plans

Statistics are a snapshot. Load a pile of rows without re-analyzing and the planner is flying blind. Watch: we build a copy of events while it is empty, so its stats say "tiny table", then fill it and query before analyzing.

sql

CREATE TABLE recent AS SELECT * FROM events WHERE false;
INSERT INTO recent SELECT * FROM events;

sql

EXPLAIN (ANALYZE, TIMING OFF, SUMMARY OFF)
SELECT * FROM recent WHERE action = 'login';

The estimate is a few hundred rows; the actual is 40,000. The planner still believes recent is empty, because nothing refreshed its stats after the bulk load. On a join this is how you get a nested loop that should have been a hash join. The fix is one command:

sql

ANALYZE recent;

sql

Estimated vs. actual: reading the planner's mind

Stale statistics pick bad plans

Anti-pattern: a function or cast on the indexed column

Anti-pattern: a leading-wildcard LIKE

Anti-pattern: OR across columns, and the type-mismatch trap

Anti-pattern: an unindexed join column

Extended statistics: teaching the planner about correlation

Your turn

What you learned