Database migrations without downtime

A database migration is safe when old code and new code can run at the same time while the schema changes. Downtime usually appears when a migration takes an exclusive lock, rewri…

A database migration is safe when old code and new code can run at the same time while the schema changes. Downtime usually appears when a migration takes an exclusive lock, rewrites a large table, blocks writes or makes application versions disagree about the shape of the data. The fix is to design migrations as small, reversible, compatibility-preserving steps.

Use expand and contract

The safest pattern is expand and contract. First expand the schema so both old and new application versions can work. Then deploy the application change. Then backfill data if needed. Finally contract the schema after the old code path is gone.

A nullable column is usually safer to add than a required column with an immediate full-table backfill. The application can start writing the new column, a background job can backfill existing rows in batches, and a later migration can add a constraint after validation.

ALTER TABLE users ADD COLUMN display_name text;

Avoid changing a column meaning in place. Add the new column or table, dual-write when necessary, verify consistency, move reads, then remove the old path in a later release.

Avoid long blocking operations

DDL is not automatically harmless. Some database engines can perform many operations online, but the exact lock level depends on the engine, version, table definition and operation.

In PostgreSQL, many forms of ALTER TABLE acquire an ACCESS EXCLUSIVE lock, while some subcommands take a lesser lock level documented per operation. PostgreSQL also supports CREATE INDEX CONCURRENTLY, which builds an index without taking locks that prevent concurrent inserts, updates or deletes on the table.

CREATE INDEX CONCURRENTLY orders_customer_id_idx
ON orders (customer_id);

In MySQL InnoDB, online DDL uses algorithms such as INSTANT, INPLACE and COPY. A migration should request the intended algorithm when possible, so it fails rather than silently falling back to a slower or more blocking path.

ALTER TABLE users
ADD COLUMN display_name varchar(255),
ALGORITHM=INSTANT;

For large MySQL table changes that cannot be done safely by native online DDL, tools such as gh-ost copy data to a shadow table and apply ongoing changes while the original table remains in use.

Backfill in batches

Backfills should be restartable, observable and rate limited. A single transaction that updates millions of rows can hold locks for too long, generate too much replication lag and make rollback expensive.

Use deterministic batches. Record progress outside the transaction or make each batch idempotent. Keep transactions short, sleep between batches when the system is under load and stop automatically if replication lag or error rates rise.

UPDATE users
SET display_name = username
WHERE display_name IS NULL
  AND id >= 10000
  AND id < 11000;

Do not assume the backfill is complete because the job reached the end once. Verify with a count query, then keep the application tolerant of missing values until the constraint is enforced.

Keep application versions compatible

Zero downtime deployment means more than database availability. It means any live application instance can talk to the database during the rollout.

A safe migration must support four states: old code with old schema, old code with expanded schema, new code with expanded schema, and new code after cleanup. If any state breaks, the deployment depends on perfect sequencing and becomes fragile.

Feature flags can help move reads and writes gradually. They also give a rollback path that does not require an emergency schema reversal.

Validate before enforcing

Constraints are valuable because they move invariants into the database. They still need careful rollout. Add them after the data is clean, and use engine-specific validation features where available.

Before adding a NOT NULL, unique constraint or foreign key, run checks that prove the existing data satisfies it. Then add the constraint in the least blocking way the database supports.

Test on production-like data

A migration that completes instantly on a small development database may rewrite a table for hours in production. Test with realistic row counts, indexes, constraints, triggers, replication and concurrent writes.

Record the expected lock behaviour, runtime, replication impact and rollback plan. The rollback plan should say whether you can revert application code only, stop a backfill, drop a new object or restore data from backup.

Conclusion

Downtime-free migrations are built from compatibility, small steps and measured database behaviour. Expand first, deploy code second, backfill carefully, validate constraints and contract only after the old path is gone. Never treat DDL as safe until the exact operation has been checked against the engine documentation and tested on data that behaves like production.