SQL indexing: when it helps and when it hurts

An index is a trade. It makes some reads dramatically faster in exchange for slower writes, more storage, and a planner that now has more choices to get wrong. Treated as a defaul…

An index is a trade. It makes some reads dramatically faster in exchange for slower writes, more storage, and a planner that now has more choices to get wrong. Treated as a default reflex ("the query is slow, add an index"), it often disappoints. Treated as a deliberate decision based on how a column is queried, it is one of the highest leverage tools you have. This post covers how B-tree indexes work, when they earn their keep, and the common cases where they do nothing or actively cost you, with the PostgreSQL and MySQL specifics that matter.

How a B-tree index works

A B-tree keeps its keys sorted and balanced, so the database can find a value, or the start of a range, in a small number of page reads rather than scanning the whole table. In PostgreSQL, CREATE INDEX builds a B-tree by default because it suits the most common cases, and the planner will consider it for the ordering and comparison operators (less than, less than or equal, equal, greater than or equal, greater than) as well as BETWEEN, IN, and IS NULL. In MySQL, B-tree is the structure InnoDB uses for its indexes, kept sorted to allow fast equality and range lookups.

One InnoDB detail is worth holding in mind because it explains a lot of later behaviour: every InnoDB table has a clustered index that stores the row data itself, usually the primary key. Every other index is a secondary index, and each of its entries also carries the primary key columns. That is why InnoDB secondary indexes can answer more queries from the index alone than you might expect.

When an index helps

Selectivity decides everything

An index pays off when a query touches a small fraction of the table. Selecting a thousand rows out of a hundred thousand is a good candidate. Selecting one row out of a hundred is usually not, because those hundred rows probably fit in a single page and no plan can beat reading one page sequentially. On a table that occupies a single page, you will nearly always get a sequential scan whether an index exists or not, and the planner is right to do so. This is why an index on a low cardinality column (a boolean, a status with three values) is frequently ignored: most values are not selective enough to be worth the indirection.

The practical lesson is to index columns you filter or join on with high selectivity, and to stop expecting an index to help when the predicate matches a large share of rows.

Covering indexes and index-only scans

If an index contains every column a query needs, the database can answer from the index without visiting the table at all. PostgreSQL calls this an index-only scan, and you can deliberately build a covering index by adding non-key columns with INCLUDE, so they ride along as payload without being part of the search key.

CREATE INDEX orders_customer_idx ON orders (customer_id) INCLUDE (status, total);

MySQL expresses the same idea as a covering index, but there is no INCLUDE clause: you list the columns in the index itself, and InnoDB's habit of appending the primary key to secondary indexes means more of your queries are covered than the index definition alone suggests.

Composite indexes and column order

A multicolumn index is ordered by its first column, then its second within that, and so on, so column order is not cosmetic. The two engines differ in how forgiving they are. MySQL uses a strict leftmost prefix rule: an index on (col1, col2, col3) supports lookups on (col1), (col1, col2), and (col1, col2, col3), but not on col2 alone, because that is not a leftmost prefix. PostgreSQL is more permissive, a multicolumn B-tree can be used with any subset of its columns, but it is most efficient when the leading columns are constrained, and the firm rule is that equality constraints on the leading columns plus the first inequality are what limit the scan. Either way, put the columns you filter on by equality first, and the range or sort column last.

When an index hurts or goes unused

Write and storage cost

Once an index exists, the database keeps it synchronised with the table, which adds overhead to every insert, update, and delete, and it occupies storage. PostgreSQL states this plainly and advises that indexes which are seldom or never used should be removed. Every extra index you carry is paid for on every write, so a table with ten indexes is doing ten times the index maintenance per row change. Index for the reads you actually run, and drop the ones that no query uses.

Predicates that defeat an index

A column index only helps if the query lets the database use it. Several common patterns quietly prevent that:

  • Applying a function or expression to the column. A filter like lower(email) = 'x' cannot use a plain index on email. The expression itself has to be indexed: PostgreSQL supports indexes on expressions, and MySQL supports functional key parts (implemented as hidden generated columns) that the optimiser matches against.
  • A leading wildcard in a pattern match. A B-tree can serve LIKE 'foo%' because the pattern is anchored at the start, but not LIKE '%bar', because there is no prefix to seek to.
  • A type mismatch that forces a conversion. In MySQL, comparing an indexed string column to a number cannot use the index, because many different strings ('1', ' 1', '1a') convert to the same number, so the index ordering is no help.
-- Cannot use a plain index on email:
SELECT * FROM users WHERE lower(email) = 'a@example.com';

-- Give the expression its own index (PostgreSQL):
CREATE INDEX users_lower_email_idx ON users (lower(email));

Confirming what the planner does

Do not guess whether an index is used, ask. In PostgreSQL, EXPLAIN shows the chosen plan (an Index Scan or Index Only Scan versus a Seq Scan), and EXPLAIN ANALYZE actually runs the query and reports real row counts and timings. In MySQL, EXPLAIN reports the candidate indexes in possible_keys and the one actually chosen in key, and a key of NULL means no index was used and the access type ALL means a full table scan.

EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;

Partial and expression indexes

Two PostgreSQL features let you index less and gain more. A partial index covers only the rows that satisfy a predicate, so an index on the small fraction of orders WHERE status = 'open' is smaller and cheaper to maintain than one over every row. An expression index, as above, indexes a computed value rather than a stored column. Both narrow the index to exactly what your queries need, which is usually the point: a smaller, well targeted index beats a large general one.

CREATE INDEX orders_open_idx ON orders (created_at) WHERE status = 'open';

Conclusion

Indexing is not about adding indexes, it is about matching them to how the data is queried. A B-tree on a selective column you filter or join on, ordered to match your predicates, and trimmed with a partial or covering definition, can turn a scan into a handful of page reads. The same index on a low selectivity column, or one defeated by a function, a leading wildcard, or a type mismatch, costs you on every write and returns nothing. Decide with EXPLAIN, not with hope, and remove the indexes that no query uses.