Webhooks: delivery, retries and verifying signatures
A webhook is an HTTP callback from one system to another. It is simple to start, but production webhook handling needs explicit rules for delivery, retries, ordering, deduplicatio…
A webhook is an HTTP callback from one system to another. It is simple to start, but production webhook handling needs explicit rules for delivery, retries, ordering, deduplication, and signature verification.
Assume at-least-once delivery
Webhook providers usually optimise for delivery rather than exactly-once processing. If your endpoint times out, returns a failing status, or is unreachable, the provider may retry the same event. The same event can also arrive more than once because of network failures or provider-side retries.
Design every webhook handler to be idempotent. Store the provider event ID before processing. If the event has already been handled, return a successful response without repeating the side effect.
Acknowledge quickly and process asynchronously
The webhook endpoint should do the minimum work needed to authenticate the request, validate the event shape, persist the event, and enqueue processing. Then it should return a 2xx response.
Do not call several downstream services before acknowledging the webhook. Long synchronous handlers increase timeout risk, which increases duplicate delivery. Use a queue or durable job table for work that can take time.
Verify signatures before trusting the body
Treat the webhook request as untrusted until verification succeeds. A typical provider signs the raw request body with a shared secret and sends the signature in a header. Verification must use the raw bytes received by the endpoint. Re-serialising parsed JSON can change whitespace or ordering and break verification.
Use the provider's official library when one exists. It will usually handle timestamp tolerance, multiple signatures during secret rotation, and algorithm details. If you implement verification yourself, compute an HMAC of the raw body with the shared secret, compare signatures using a constant-time comparison, and reject stale timestamps.
Rotate secrets safely
Webhook signing secrets should be stored like credentials. Do not log them, embed them in client code, or share them across unrelated environments. Production, staging, and development should use separate secrets.
Secret rotation should allow an overlap period where both the old and new secret can verify incoming events. After the provider stops signing with the old secret, remove it from the receiver.
Validate event schema and type
Signature verification proves that the request came from someone with the secret. It does not prove that your code can process every event type safely. Check the event type, version if present, account or tenant identifier, and required fields before processing.
Ignore or store unknown event types rather than failing the whole endpoint. Providers add new event types over time. A receiver should not break because it subscribed broadly or because the provider expanded its catalogue.
Handle ordering explicitly
Do not assume events arrive in the order they happened. Retries, parallel delivery, provider partitions, and network delays can reorder events. If order matters, use the event timestamp, sequence number, resource version, or fetch the current resource state from the provider before applying a change.
For many integrations, a fetch-before-process pattern is safer than trusting the event payload as the final state. The webhook tells you that something changed. The provider API tells you what the state is now.
Return the right status
Return 2xx only when the event has been accepted for processing. Return 400 for malformed requests that should not be retried. Return 401 or 403 when authentication or signature verification fails. Return 5xx only for temporary receiver failures where a retry is useful.
Be careful when rate limiting a webhook provider. A 429 can trigger retries and worsen a backlog. When possible, accept, persist, and process later rather than rejecting bursts from a trusted provider.
Keep observability per event
Log the provider event ID, event type, delivery attempt if available, verification result, processing status, and correlation ID. Metrics should show received, verified, deduplicated, queued, processed, failed, and retried counts.
Provide an operator path to replay stored events safely. Replay must use the same idempotency checks as live delivery.
Test with real provider tooling
Use provider dashboards, CLIs, fixtures, and local forwarding tools to test signature verification and retry behaviour. Unit tests with hand-written JSON are useful, but they do not prove that raw-body handling, headers, timestamps, and endpoint status codes work with the real provider.
Document how to add a new event type, how to rotate secrets, how to replay an event, and how to disable processing during an incident without losing incoming events.
Conclusion
Reliable webhooks are built around at-least-once delivery, fast acknowledgement, durable storage, idempotent processing, signature verification, explicit ordering rules, and event-level observability. The endpoint is not just a controller. It is the boundary between two distributed systems.
