Backups you can actually restore from

A backup is only useful when it can be restored inside the recovery window and to an acceptable point in time. Until restoration is tested, the backup is only an assumption.

Start with RPO and RTO

Recovery point objective, or RPO, is the maximum acceptable data loss measured as time. Recovery time objective, or RTO, is the maximum acceptable time to restore service.

These numbers should be set by business impact, not by default tool settings. A public marketing site, an order database, and an audit log archive can have very different recovery needs.

Backups must be designed to meet both values. A daily backup cannot meet a fifteen minute RPO. A backup that takes two days to restore cannot meet a four hour RTO.

Know what must be recovered

Reliable recovery needs more than database files. List everything required to restore the service.

Typical recovery scope includes:

application data
schema and migration history
object storage
search indexes or a rebuild plan
message streams or replay position
configuration
secrets and key material
infrastructure definitions
container images or release artefacts
DNS and routing configuration
runbooks and access paths

If a component can be rebuilt, document the rebuild steps and expected duration. If it cannot be rebuilt inside the RTO, it belongs in the recovery plan.

Protect backups from the same failure

Backups should survive the incident that made them necessary. Accidental deletion, compromised credentials, ransomware, regional outage, bad deployment, and operator error all have different failure patterns.

Use separation deliberately. That may mean separate accounts, separate regions, immutable storage, restricted deletion rights, separate encryption keys, and monitored backup access. The right design depends on the threat model and recovery objectives.

A backup that can be deleted by the same identity that can delete production data is not strong protection against account compromise.

Automate creation and verification

Manual backups are easy to miss. Automate backup creation, retention, expiry, and monitoring.

Monitor at least:

last successful backup time
backup size and unexpected size changes
backup duration
backup failure count
replication or copy status
retention policy compliance
encryption status

Verification must go beyond job success. A completed backup job does not prove that the data is usable.

Restore regularly

Periodic restore tests prove whether the backup process meets RPO and RTO. They also reveal missing permissions, missing configuration, slow transfer paths, incompatible versions, broken encryption keys, and undocumented manual steps.

A restore test should create a fresh environment, restore from backup, run integrity checks, run application smoke tests, and record elapsed time. The result should be reviewed against the stated objectives.

Do not test only the easiest path. Test point in time recovery, single object recovery, full environment recovery, and recovery after a deliberately bad change where relevant.

Make restoration repeatable

The restore process should be scripted where possible and documented where judgement is required.

A good restore runbook includes:

prerequisites and required access
how to choose the restore point
how to create the recovery environment
restore commands or workflows
validation checks
cutover steps
rollback or abort criteria
communication points
expected timings

Keep commands current. A restore command copied from an old incident can be worse than no command at all.

Validate integrity and application behaviour

A database that starts is not necessarily a recovered service. Validate the data and the application.

Use checks such as:

database consistency checks
expected table and object counts
application smoke tests
authentication tests
critical read and write paths
background worker checks
audit log continuity
monitoring and alerting checks

Record known gaps. If search indexes are rebuilt after restore, state how long that takes and what users see while it happens.

Practise destructive scenarios safely

The hardest recoveries are caused by bad writes, accidental deletion, and compromise. Practise them in non-production or isolated recovery environments.

Useful exercises include:

restore after a deleted table
restore after corrupted application data
restore a single tenant or account where architecture supports it
restore after credentials are rotated
restore into a clean account or region
prove that immutable backups cannot be altered by normal production roles

The point is not to create theatre. The point is to find the step that fails before a real incident.

Keep evidence

Keep records of backup tests. Record the source backup, restore point, environment, commands or workflow used, duration, validation results, issues found, and follow-up actions.

This evidence helps audits, but its operational value is greater. It shows whether the team can still restore after architecture, tooling, data volume, or staffing changes.

Conclusion

Backups are a recovery capability, not a storage habit. Define RPO and RTO, protect backups from the failures you expect, automate creation, test restoration, validate the recovered service, and keep evidence. A backup you cannot restore from is not a recovery plan.