Understanding transactions and isolation levels

A transaction groups several statements into one unit that either fully happens or does not happen at all. Isolation levels decide how much one in flight transaction can see of another's uncommitted or concurrent work. Get the level wrong and you either allow subtle data races or you serialise everything and lose throughput. The defaults are not the same across engines, which is the single most common source of surprise, so this post walks through the anomalies, the four standard levels, and how PostgreSQL and MySQL actually behave.

ACID in one paragraph

A transaction is described by four properties. Atomicity means all of its changes commit together or none do. Consistency means the database moves from one valid state to another, and a reader sees all old values or all new ones, never a mix. Isolation means concurrent transactions do not see each other's in progress work. Durability means that once a commit succeeds, the changes survive a crash or power loss. Isolation is the property with a dial on it, and that dial is the isolation level.

The read anomalies

The isolation levels are defined by which concurrency anomalies they permit.

Dirty read

A transaction reads a row that another transaction has written but not yet committed. If the writer rolls back, the reader acted on data that never existed.

Non-repeatable read

A transaction reads a row, a second transaction commits a change to that row, and when the first transaction reads it again it gets a different value. The same query, twice, two answers.

Phantom read

A transaction runs a query that matches a set of rows, another transaction commits a new row that also matches, and re-running the query now returns an extra row. The difference from a non-repeatable read is that whole rows appear or vanish rather than existing values changing.

The four isolation levels

The SQL standard defines four levels in terms of which anomalies they allow. Read uncommitted permits dirty reads, non-repeatable reads, and phantoms. Read committed forbids dirty reads but still permits non-repeatable reads and phantoms. Repeatable read additionally forbids non-repeatable reads, leaving phantoms. Serializable forbids all three and makes concurrent transactions behave as if they ran one after another. That is the standard. What each engine actually implements is stricter in places, and the defaults differ.

How PostgreSQL behaves

PostgreSQL defaults to read committed. It uses multiversion concurrency control (MVCC), so each statement sees a snapshot of the data as it was at a point in time, and reading never blocks writing while writing never blocks reading. Internally it implements only three distinct levels, so asking for read uncommitted gives you read committed behaviour: PostgreSQL never returns a dirty read.

Its repeatable read is stronger than the standard requires. It is implemented as snapshot isolation and does not allow phantom reads. Its serializable level adds Serializable Snapshot Isolation (SSI) on top, which detects dependencies among concurrent transactions and aborts one rather than allowing a non serialisable outcome. Because of that, both serializable and repeatable read transactions can fail at commit with a serialization error, and the application must be prepared to retry them.

BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- ... statements ...
COMMIT;  -- may raise: could not serialize access due to read/write dependencies

How MySQL InnoDB behaves

MySQL with InnoDB defaults to repeatable read, not read committed. This difference alone catches people porting assumptions between the two. InnoDB uses consistent snapshot reads, so within a transaction all plain reads see the snapshot established by the first read.

InnoDB prevents phantoms at repeatable read through locking rather than pure snapshotting. It uses next-key locking, which combines an index record lock with a gap lock on the gap before the record, so other transactions cannot insert into the range you are reading. Drop to read committed and gap locking is largely disabled, which lets phantoms reappear. MySQL reaches serializable mainly by locking too: it implicitly turns plain SELECT statements into locking reads. The practical consequence is that the failure you hit under contention in MySQL is usually a lock wait timeout or a deadlock (which rolls back one transaction) rather than the dependency based serialization abort you get in PostgreSQL.

Handling serialization failures

At the strict levels, "the database will sort it out" is not a strategy. Under PostgreSQL serializable, and repeatable read too, a transaction can be aborted to preserve correctness, and the only correct response is to retry it from the beginning. Under MySQL, a deadlock victim is rolled back and likewise needs retrying. Either way, wrap transactions that run at these levels in retry logic with a bounded number of attempts, and make the work inside them idempotent so a retry is safe.

Setting the level

Both engines let you set the level per transaction with SET TRANSACTION ISOLATION LEVEL, choosing from serializable, repeatable read, read committed, and read uncommitted. Set it deliberately based on what the transaction needs: read committed for ordinary work, repeatable read or serializable when a transaction makes decisions based on data it must not see change underneath it.

SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

Conclusion

Isolation levels are a trade between how much concurrency anomaly you tolerate and how much throughput you keep. Start from the defaults, but know them: PostgreSQL gives you read committed, MySQL InnoDB gives you repeatable read, and the two engines reach the stricter levels by different means, snapshots and SSI in PostgreSQL, gap and next-key locks in MySQL. Raise the level only where a transaction genuinely needs it, and where you do, add retry logic, because at the top of the scale the database will sometimes refuse to commit rather than be wrong.