{
  "version": "https://jsonfeed.org/version/1.1",
  "title": "Soulstack blog",
  "home_page_url": "https://soulstack.co.uk/blog",
  "feed_url": "https://soulstack.co.uk/blog/feed.json",
  "authors": [
    {
      "name": "Soulstack",
      "url": "https://soulstack.co.uk"
    }
  ],
  "items": [
    {
      "id": "https://soulstack.co.uk/blog/gg-what-does-a-twitch-chatbot-do",
      "url": "https://soulstack.co.uk/blog/gg-what-does-a-twitch-chatbot-do",
      "title": "What does a Twitch chatbot do?",
      "summary": "A Twitch chatbot is a helper that sits in chat and handles repeat jobs. It can post commands, run timers, filter spam, and give moderators extra tools. On a small channel, that me…",
      "content_html": "<p>A Twitch chatbot is a helper that sits in chat and handles repeat jobs. It can post commands, run timers, filter spam, and give moderators extra tools. On a small channel, that means less typing. On a busy channel, it means fewer avoidable messes.</p>\n<p>The point is not to make chat feel robotic. The point is to automate the dull parts so the streamer and mods can stay focused on the conversation.</p>\n<h2>Think of it as a staff tool</h2>\n<p>The best way to judge a bot is not by how many features it has. Judge it by how much repetitive work it removes.</p>\n<p>Useful bot jobs usually fall into four buckets:</p>\n<ul>\n<li>answering common questions</li>\n<li>posting occasional reminders</li>\n<li>filtering obvious bad behaviour</li>\n<li>supporting basic community games or rewards</li>\n</ul>\n<p>If a feature does not help one of those, it is probably optional.</p>\n<h2>Commands are the everyday feature</h2>\n<p>Most streamers first notice bots because of commands. A viewer types <code>!discord</code> or <code>!schedule</code>, and the bot replies with the saved answer.</p>\n<p>That matters because live chat moves fast. Even on a quiet stream, the same questions come up again and again. A bot gives a consistent answer every time without pulling the streamer out of the moment.</p>\n<p>For many small channels, commands alone justify using a bot.</p>\n<h2>Timers handle reminders you forget to say</h2>\n<p>Some information matters, but people rarely ask for it. A timer solves that. It can post a reminder every so often about your Discord, your rules, or a charity link during an event stream.</p>\n<p>This is helpful when used lightly. It is irritating when every ten minutes the chat gets another long sales pitch. The tool is neutral. The way it is configured decides whether people are grateful or tired of it.</p>\n<h2>Moderation is where bots earn trust</h2>\n<p>Bots can catch things humans miss or simply react faster:</p>\n<ul>\n<li>repeated spam</li>\n<li>scam links</li>\n<li>mass caps or symbol spam</li>\n<li>banned words or phrases</li>\n<li>self-promo dropped into chat at random</li>\n</ul>\n<p>They are not a replacement for moderators. They are a filter that removes low-effort noise before a person needs to step in.</p>\n<p>That matters most when the stream is busy, but even a small channel benefits from having basic protections in place before a bad night arrives.</p>\n<h2>Some bots can run mini systems</h2>\n<p>Depending on the service, a chatbot may also manage:</p>\n<ul>\n<li>loyalty points</li>\n<li>giveaways</li>\n<li>quote commands</li>\n<li>song requests</li>\n<li>queue systems for viewer games</li>\n<li>custom alerts or overlays</li>\n</ul>\n<p>These extras can be useful, but they are not the core job. A bot that handles commands and moderation well is already doing enough for most channels.</p>\n<h2>What a bot cannot do for you</h2>\n<p>There are limits that matter.</p>\n<p>A chatbot cannot decide whether a joke crossed the line in your community. It cannot know whether a regular viewer needs a gentle warning or a timeout. It cannot create a warm atmosphere by itself.</p>\n<p>Good moderation still needs judgement. Good community building still needs the streamer to show up, respond well, and set the tone.</p>\n<h2>Signs you should set one up now</h2>\n<p>You do not need to wait for a big audience. A bot is worth adding when any of these are true:</p>\n<ul>\n<li>you answer the same question on most streams</li>\n<li>spam or weird links show up even occasionally</li>\n<li>you want a rules command that moderators can trigger</li>\n<li>you run community nights, raffles, or other repeat formats</li>\n</ul>\n<p>At that point, the setup time is paid back quickly.</p>\n<h2>Signs you are asking too much of it</h2>\n<p>Trouble starts when the bot becomes the loudest voice in the room. That often looks like:</p>\n<ul>\n<li>too many timers</li>\n<li>long command replies</li>\n<li>automated jokes every few minutes</li>\n<li>loyalty systems nobody understands</li>\n<li>a command list larger than your mods can remember</li>\n</ul>\n<p>The cleaner version almost always wins. Chat should feel helped, not managed.</p>\n<h2>Picking the right starting feature set</h2>\n<p>For a new or modest-sized channel, this is enough:</p>\n<ul>\n<li>three or four commands</li>\n<li>one moderation filter setup</li>\n<li>one occasional timer</li>\n</ul>\n<p>Once those are working, add extra features only if you can name the problem they solve. That keeps the bot aligned with the stream instead of becoming a hobby project in its own right.</p>\n<h2>The real value</h2>\n<p>The strongest chatbot setup is usually the least noticeable one. Viewers get their answers faster. Mods handle issues with less friction. The streamer spends less time typing and more time staying present.</p>\n<p>That is the whole job. Everything beyond that is optional decoration.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "streaming",
        "twitch"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/trustcut-ai-chat-for-finding-your-barber-on-trustcut",
      "url": "https://soulstack.co.uk/blog/trustcut-ai-chat-for-finding-your-barber-on-trustcut",
      "title": "An AI chat to help you find your barber in the UK",
      "summary": "TrustCut AI is a free chat assistant that helps clients find the right barber, compare services and check live availability. It is live now at the TrustCut AI page and as a floati…",
      "content_html": "<p>TrustCut AI is a free chat assistant that helps clients find the right barber, compare services and check live availability. It is live now at the TrustCut AI page and as a floating widget on the homepage. This post explains what the chat does today, what it does not do, and where it sits next to the AI features in the larger booking platforms.</p>\n<h2>What TrustCut AI does today</h2>\n<p>The chat is grounded in real TrustCut data. When you ask it to find a barber near a UK postcode, it queries the same listings that power the public profile pages. When you ask about service prices, it reads the same service menus the booking page renders. When you ask a barbering question, the chat searches the TrustCut article library and replies in plain English.</p>\n<p>A typical session looks like this. A client types: find me a barber in central Manchester who does skin fades on a Saturday afternoon. The chat returns a short list of matching public profiles, marks which ones take bookings on TrustCut and which are discovery only, and links straight to each profile.</p>\n<p>You can also ask the chat to compare services, explain barbering terminology, walk you through aftercare, or summarise what a particular shop offers. The answers are short, link to the source profile or article, and avoid sales copy.</p>\n<h2>What it does not do today</h2>\n<p>The chat does not take a booking on its own. Once you have chosen a barber, the booking flow stays on the barber profile, with deposits and confirmations handled by TrustCut in the usual way. The chat also does not ask for personal data and does not follow up with marketing.</p>\n<p>It also will not invent listings. If a search returns no public profiles for a town or service, the chat says so plainly and suggests a wider search or a different service type.</p>\n<h2>How TrustCut AI fits next to other barbering platforms</h2>\n<p>This is the honest competitive picture as of May 2026.</p>\n<p>Booksy, Fresha and Squire shipped AI features during 2024 and 2025. Booksy integrates with the Google AI Mode search experience, so a query like find me a barber Saturday afternoon can route into live Booksy availability when a shop is on the Booksy marketplace. Fresha runs an AI receptionist that handles client questions for individual salons, plus an Intelligent Scheduling feature that helps fill gaps. Squire ships AI phone answering, smart rebooking and AI marketing copy aimed at shop owners. All three lean toward AI that fills slots on the platform&#39;s own marketplace.</p>\n<p>Nearcut and Treatwell, two platforms with a strong UK barber presence, do not advertise an AI chat feature at the time this article was written.</p>\n<p>The gap TrustCut AI fills is a neutral client side assistant. It does not exist to sell slots for a single marketplace. It includes discovery profiles for barbers who do not take bookings on TrustCut, and it says so up front, so the answer is genuinely useful for a client who is choosing where to walk in or call.</p>\n<h2>Privacy and accuracy</h2>\n<p>Chats are not used to train AI models. The chat does not ask for personal data. If a user volunteers a name, phone number or address, the chat is instructed to ignore it and steer back to the search.</p>\n<p>Accuracy notes. The competitor feature list above is correct at the time of writing. AI features in this market change every month. Each comparison claim in this article is dated to May 2026, and the article will be revisited when there is a material change. The version history sits on the TrustCut blog so the date of the latest review is visible at the top of the article.</p>\n<h2>Try the chat</h2>\n<p>Open the TrustCut AI page from the main navigation or use the floating widget on the homepage. There is no sign in required. If you run a barbershop and want clients to find you through the chat, claim a free TrustCut profile and the shop will appear in the public listings the chat searches over.</p>\n<h2>Status note, May 2026</h2>\n<p>This article is dated to May 2026. The TrustCut AI features described above are live for the discovery only go live. Booking, deposits and review handling continue to work the same way on profile pages. If any of the competitor feature claims above stop being correct, this article will be updated and the change noted at the top.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "trustcut",
        "ai"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/spec-driven-development-with-ai",
      "url": "https://soulstack.co.uk/blog/spec-driven-development-with-ai",
      "title": "Spec driven development with AI",
      "summary": "Traditional software projects often start with vague requirements that evolve during implementation. AI coding agents magnify this problem because they follow prompts literally an…",
      "content_html": "<p>Traditional software projects often start with vague requirements that evolve during implementation. AI coding agents magnify this problem because they follow prompts literally and cannot infer hidden assumptions. Spec driven development (SDD) addresses this by writing a clear specification before coding begins. The specification becomes a contract: the agent derives code from the spec and validates its work against it. This post explores SDD with GitHub Spec Kit, Kiro, Claude Code and Codex.</p>\n<h2>Why spec driven development</h2>\n<p>Spec driven development flips the relationship between specs and code. GitHub&#39;s Spec Kit describes this as flipping the script on traditional software development, where for decades code has been king and specifications were just scaffolding. Specs capture the reasoning behind decisions and become living documents that evolve alongside the code.</p>\n<p>In an article on Martin Fowler&#39;s site, Birgitta Boeckeler describes three levels of spec driven development. Spec-first means a well thought out spec is written first and then used in the AI assisted workflow for the task at hand. Spec-anchored means the spec is kept after the task is complete, to keep using it for evolution and maintenance of that feature. Spec-as-source means the spec is the main source file over time, the human only edits the spec and never touches the generated code. These three framings help teams decide how much process they actually need.</p>\n<h2>GitHub Spec Kit</h2>\n<p>Spec Kit is an open source toolkit that operationalises SDD. You install the specify CLI from the Spec Kit repository with uv, then scaffold a project with specify init:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">specify</span><span style=\"color:#A5D6FF\"> init</span><span style=\"color:#A5D6FF\"> my-project</span></span></code></pre></div><p>Once initialised, you create a project constitution and then drive the workflow with slash commands inside your agent:</p>\n<ul>\n<li>/speckit.constitution sets high level principles for the project.</li>\n<li>/speckit.specify creates the feature specification with requirements and constraints.</li>\n<li>/speckit.plan creates a technical plan outlining architecture and tasks.</li>\n<li>/speckit.tasks generates tasks from the plan.</li>\n<li>/speckit.implement executes the tasks using an AI coding agent.</li>\n<li>/speckit.taskstoissues turns tasks into GitHub issues for collaboration.</li>\n</ul>\n<p>Additional optional commands such as /speckit.clarify, /speckit.analyze and /speckit.checklist help refine and cross-check the spec. Spec Kit treats the spec as a living document: update it when decisions change and regenerate the tasks to keep the implementation aligned.</p>\n<h2>Kiro specs</h2>\n<p>Kiro&#39;s workflow is inherently spec driven. You begin with a requirements document written in EARS (Easy Approach to Requirements Syntax) notation, which captures testable acceptance criteria in the form WHEN a condition THE SYSTEM SHALL a behaviour. The agent then generates a design document and a tasks document, breaking the work into actionable steps. Code changes happen only after the three documents exist. Every spec produces three files: requirements.md (or bugfix.md for a bug fix), design.md and tasks.md. Kiro can run independent tasks concurrently to speed up execution. Persistent project context lives in steering files such as product.md, tech.md and structure.md.</p>\n<h2>Plan based workflows in Claude Code</h2>\n<p>Claude Code does not require a full specification, but its plan mode encourages a similar discipline. Plan mode is a read-only mode where the agent researches the codebase and proposes changes without making them. You enter it with Shift+Tab, by prefixing a prompt with /plan, or with the --permission-mode plan flag. Claude presents a plan in the session for you to review, and you can open the proposed plan in your editor with Ctrl+G. Only after you approve the plan does Claude switch to a write capable mode and implement the changes. This is lighter than Spec Kit or Kiro, but it serves the same purpose: agree on the approach before any code is written.</p>\n<h2>Where Codex fits</h2>\n<p>Codex does not ship a formal multi-document spec workflow. Instead it reads standing instructions from AGENTS.md files and plans multi-file changes inside its agentic loop before executing them. If you want spec-like discipline with Codex, put your requirements, constraints and conventions in AGENTS.md and review the diffs the agent proposes. This gives you a persistent contract without the document scaffolding that Kiro and Spec Kit provide.</p>\n<h2>Differences and similarities</h2>\n<p>All of these approaches share one idea: write down what you want before coding. Spec Kit and Kiro formalise this into multiple documents with defined structures, while Claude Code offers a lighter plan mode and Codex relies on AGENTS.md plus its planning loop. Spec Kit integrates tightly with GitHub and provides commands to create specs, plans and tasks. Kiro uses EARS notation and separates requirements, design and tasks. Claude Code emphasises an approve-before-edit plan step. The right choice depends on how much process overhead your team can absorb and how complex the work is. In every case, keeping the spec or plan up to date is what prevents the agent from drifting away from the intended design.</p>\n<h2>Conclusion</h2>\n<p>Spec driven development gives AI coding agents a clear contract to follow. By defining requirements, constraints and plans before implementation, teams reduce ambiguity, enforce architecture and capture decisions for later. GitHub Spec Kit, Kiro, Claude Code plan mode and AGENTS.md driven Codex each support this workflow at a different level of formality. The discipline of writing and maintaining a specification remains the core of the practice, whatever tool you pick.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "ai",
        "spec-driven-development",
        "developer-workflow",
        "planning"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/agents-md-for-ai-coding-and-platform-agents",
      "url": "https://soulstack.co.uk/blog/agents-md-for-ai-coding-and-platform-agents",
      "title": "AGENTS.md for AI coding and platform agents",
      "summary": "AI coding agents need guidance to operate safely and effectively. Without clear instructions they may make unwanted changes, run unsafe commands or drift from architectural conven…",
      "content_html": "<p>AI coding agents need guidance to operate safely and effectively. Without clear instructions they may make unwanted changes, run unsafe commands or drift from architectural conventions. AGENTS.md is an open standard that addresses this. It is a markdown file placed at the root of a repository (and optionally in subdirectories) that gives coding agents the context and instructions they need. This post explains what AGENTS.md is, which tools read it, how it relates to other memory files and the current state of adoption.</p>\n<h2>What is AGENTS.md</h2>\n<p>The AGENTS.md project describes the file as a simple, open format for guiding coding agents, and as a README for agents: a dedicated, predictable place to provide the context and instructions that help AI coding agents work on your project. Where a traditional README explains a project to humans, AGENTS.md tells an agent how to build, test and run the software, along with environment requirements and coding conventions. It is just standard Markdown with no required fields, so you can use whatever headings suit your project, and it lives alongside your code in version control.</p>\n<p>The format supports nesting. In a monorepo you can place another AGENTS.md inside each package, and agents automatically read the nearest file in the directory tree, so the closest one takes precedence. This gives granular control over agent behaviour without duplicating global instructions in every folder.</p>\n<h2>Supported tools</h2>\n<p>AGENTS.md has been adopted across a wide range of agents and editors. The project site lists a broad and growing set of supporting tools, including Codex, Jules, Cursor, Windsurf, Aider, goose, opencode, Zed, Warp, VS Code, Devin, Junie, Amp, Gemini CLI, the GitHub Copilot coding agent and Augment Code, among others. These tools read AGENTS.md automatically when it is present. For example, Codex loads AGENTS.md from its home directory, the project root and each directory down to the working directory, concatenating them with closer files overriding more distant ones. Kiro also recognises AGENTS.md and, unlike its regular steering files, always includes it.</p>\n<p>The breadth of adoption is significant: the project site reports tens of thousands of open source projects using AGENTS.md (over 60,000 at the time of writing). Because the format is open and human readable, new tools can implement support easily, so treat any specific count as a moving figure.</p>\n<h2>Relationship to other memory files</h2>\n<p>AGENTS.md is one of several mechanisms for providing context to AI agents. It differs from CLAUDE.md (used by Claude Code) and steering files (used by Kiro) in scope and intent. AGENTS.md holds operational policy and explicit build and test commands, whereas CLAUDE.md holds instructions and context specific to Claude Code, and steering files record conventions and patterns for Kiro at the workspace or global level. You can keep an AGENTS.md alongside these other files, and tools that support multiple formats will use the ones they understand. With Codex, for instance, AGENTS.md carries team-wide instructions while its memory feature holds personal preferences.</p>\n<h2>Current adoption and patterns</h2>\n<p>A growing number of projects and organisations standardise on AGENTS.md. The most effective files contain command-first instructions with clear done criteria rather than vague prose, because explicit commands and acceptance checks are what agents act on reliably. Adoption is broad but not universal: some tools still rely on their own formats, and there is no single governing body enforcing the standard. Even so, the simple markdown format and wide tooling support make AGENTS.md a practical way to give agents the context they need.</p>\n<h2>Conclusion</h2>\n<p>AGENTS.md is a lightweight but powerful way to convey build and run instructions to AI coding agents. By separating operational policy from human documentation, it helps agents act consistently across environments. Many modern tools support it out of the box and adoption continues to grow. Teams should consider adding an AGENTS.md file to their repositories, alongside any other memory mechanisms, to make sure AI agents understand how to build and test their code.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "ai",
        "agents",
        "agents-md",
        "developer-workflow"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/claude-code-for-software-and-platform-teams",
      "url": "https://soulstack.co.uk/blog/claude-code-for-software-and-platform-teams",
      "title": "Claude Code for software and platform teams",
      "summary": "Claude Code is an AI coding assistant from Anthropic. It runs locally on your machine but connects to Anthropic's models in the cloud. You interact with it through a terminal, a d…",
      "content_html": "<p>Claude Code is an AI coding assistant from Anthropic. It runs locally on your machine but connects to Anthropic&#39;s models in the cloud. You interact with it through a terminal, a desktop app or editor integrations. Claude Code can read, edit and run code in your repository, propose plans for complex tasks and remember your preferences through memory files. This post explains how to install it, configure its permissions, use plan mode and decide when it fits your workflow.</p>\n<h2>What Claude Code is</h2>\n<p>Claude Code is available on the web, as a desktop app, in VS Code and JetBrains IDEs, in Slack, and in CI/CD with GitHub Actions and GitLab. The core experience runs in your terminal: you start a session with claude, point it at a repository and chat with the agent. It can read files, suggest changes, run tests and commit code. Because it runs locally, your source stays on your machine unless you share context. Claude Code uses Anthropic&#39;s models and adds coding-specific tools and memory.</p>\n<h2>Installation</h2>\n<p>On macOS, Linux and WSL you install with a shell script. On Windows you use a PowerShell command. These are genuinely different per platform, so they belong in a code group:</p>\n<div class=\"code-group\" data-code-group><div class=\"code-group__tabs\" role=\"tablist\"><button class=\"code-group__tab is-active\" type=\"button\" role=\"tab\" aria-selected=\"true\" data-cg-tab=\"0\">Bash</button><button class=\"code-group__tab\" type=\"button\" role=\"tab\" aria-selected=\"false\" data-cg-tab=\"1\">PowerShell</button></div><div class=\"code-group__panel is-active\" role=\"tabpanel\" data-cg-panel=\"0\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">curl</span><span style=\"color:#79C0FF\"> -fsSL</span><span style=\"color:#A5D6FF\"> https://claude.ai/install.sh</span><span style=\"color:#FF7B72\"> |</span><span style=\"color:#FFA657\"> bash</span></span></code></pre></div><div class=\"code-group__panel\" role=\"tabpanel\" data-cg-panel=\"1\" hidden><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span>irm https://claude.ai/install.ps1 | iex</span></span></code></pre></div></div><p>The installer downloads the CLI, registers it on your path and opens a login prompt. You can also install with Homebrew (brew install --cask claude-code), WinGet (winget install Anthropic.ClaudeCode) or npm (npm install -g @anthropic-ai/claude-code, which needs a recent Node.js release, currently Node.js 18 or later; check the install docs for the current minimum). After installing, run claude to start a session and log in.</p>\n<h2>Permission modes</h2>\n<p>Claude Code uses permission modes to control what the agent can do without asking. You cycle through the three core modes with Shift+Tab during a session. The full set is:</p>\n<ul>\n<li>default. Claude reads files without asking, but prompts before each file edit, shell command or network request.</li>\n<li>acceptEdits. Claude accepts file edits automatically so you are not prompted for each one.</li>\n<li>plan. A read-only mode for exploring a codebase and proposing changes without making them.</li>\n<li>auto. Claude executes without permission prompts, while a separate classifier model reviews actions before they run. It is a research preview and requires a recent version of Claude Code; see the permission-modes docs for the current minimum.</li>\n<li>dontAsk. Claude automatically denies any tool call that would otherwise prompt, for locked-down CI or scripts.</li>\n<li>bypassPermissions. Claude runs anything without prompts, intended for isolated sandbox containers.</li>\n</ul>\n<p>By default, Shift+Tab cycles through default, acceptEdits and plan. Once they are enabled, auto and bypassPermissions also join the cycle; dontAsk is never in it and is selected with the --permission-mode flag. You can configure a default mode in your settings or override it per session.</p>\n<h2>Memory and CLAUDE.md</h2>\n<p>Claude Code starts each session fresh, but you can provide persistent context through memory files. There are two mechanisms.</p>\n<ul>\n<li>CLAUDE.md is a markdown file you write yourself, holding conventions, architecture notes and rules of thumb. You can place it at the project level, the user level (~/.claude/CLAUDE.md) or the organisation level through a managed policy file. CLAUDE.md files are loaded in full regardless of length, but Anthropic recommends targeting under 200 lines per file because shorter files produce better adherence.</li>\n<li>Auto memory is a set of notes Claude maintains in a MEMORY.md entrypoint under ~/.claude/projects/<project>/memory/. The first 200 lines of MEMORY.md, or the first 25 KB, whichever comes first, are loaded at the start of every conversation. This load limit applies only to MEMORY.md, not to CLAUDE.md. Auto memory is on by default in recent versions of Claude Code, and you can manage it with the /memory command.</li>\n</ul>\n<p>Structure CLAUDE.md as imperative instructions with clear do and do-not lists, and keep it concise. Because memory is context rather than a rule engine, you still need tests and code review to enforce policy.</p>\n<h2>Plan mode</h2>\n<p>Plan mode is a read-only mode for researching a codebase and proposing changes without executing anything. You enter it with Shift+Tab, by prefixing a prompt with /plan, or with the --permission-mode plan flag. Claude reads the relevant code and presents a plan in the session for you to review, and you can open the proposed plan in your editor with Ctrl+G. When you approve, Claude offers to continue in auto mode, accept edits or hand changes back for manual review. Plan mode is useful for large refactors or new modules where you want to agree on the approach before any code changes.</p>\n<h2>Strengths</h2>\n<ul>\n<li>Local execution. Claude Code runs on your machine and reads your code without uploading it, and remote integrations such as Slack and CI/CD are optional.</li>\n<li>Rich permission model. The permission modes let you tune autonomy from prompt-on-each-tool up to fully sandboxed bypass.</li>\n<li>Persistent context. CLAUDE.md and auto memory let the agent remember conventions across sessions.</li>\n<li>Approve before editing. Plan mode lets you review the approach before changes are made.</li>\n</ul>\n<h2>Limits</h2>\n<p>Claude Code&#39;s local-first design means some hosted-only features are out of scope. Auto memory loads only the first 200 lines or 25 KB of MEMORY.md, so very large auto-memory notes get truncated and large standing guidance belongs in a concise CLAUDE.md. The permission model can feel intrusive when you want to move fast, since you toggle modes or approve actions. Finally, Claude Code is a proprietary tool: the CLI is free to install, but full use requires an Anthropic plan.</p>\n<h2>When to choose Claude Code</h2>\n<p>Use Claude Code when you want a private, controllable AI assistant that can plan and implement changes on your machine. Its memory and permission systems suit regulated environments where predictable behaviour and auditability matter, and plan mode is particularly useful for architectural work and large refactors. If you mainly need quick inline suggestions or prefer a fully open source tool, another agent may suit you better.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "ai",
        "tooling",
        "claude-code",
        "developer-workflow"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/claude-code-team-feature-explained",
      "url": "https://soulstack.co.uk/blog/claude-code-team-feature-explained",
      "title": "Claude Code team feature explained",
      "summary": "Claude Code can run multiple agents that work together. A single session can already launch subagents that report back to the main agent, but the agent teams feature goes further:…",
      "content_html": "<p>Claude Code can run multiple agents that work together. A single session can already launch subagents that report back to the main agent, but the agent teams feature goes further: it creates separate Claude Code instances that communicate directly. The feature is experimental and disabled by default. When enabled, a team lead coordinates other Claude Code instances working on related tasks. This post explains what agent teams are, how they differ from subagents, how to enable them and where they help.</p>\n<h2>What agent teams are</h2>\n<p>In a standard Claude Code session you can launch subagents to handle individual tasks. Subagents run within the same session: they do their work and report results back to the main agent, and they never communicate with each other. Agent teams are different. Teammates are independent Claude Code instances. They share a task list, claim work and communicate directly with one another through a built-in messaging system. Because each teammate has its own context, the team can divide work that a single context window would struggle to hold.</p>\n<h2>How to enable it</h2>\n<p>Agent teams require a recent version of Claude Code. Check the official agent teams page for the current minimum, and confirm your version with claude --version. The feature is disabled by default; you enable it by setting CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS to 1, either as an environment variable or in settings.json. When you start a session with teams enabled, Claude acts as the team lead. The lead creates a shared task list and spawns teammates as needed. Each teammate works on its share of the tasks, and teammates can message the lead or each other to share findings and coordinate. You can also message an individual teammate directly. The lead monitors progress and presents a combined result.</p>\n<p>Because each teammate runs in its own context, teammates can read different files, hold separate state and run commands concurrently. That makes agent teams well suited to work that spans multiple modules or layers of a system, where subagents in a single session would be more constrained.</p>\n<h2>Who it is for</h2>\n<p>The documentation describes several use cases: research and review tasks that split across multiple agents, building a new module that touches several layers of a system, debugging with competing hypotheses, and cross-layer coordination such as frontend, backend and tests. Each teammate keeps its own context and can talk to the others, so a complex task can be divided and conquered.</p>\n<h2>Strengths</h2>\n<ul>\n<li>Parallel execution. Multiple Claude Code instances work in parallel on different parts of a task, which can reduce overall completion time.</li>\n<li>Shared coordination. Teammates each keep their own independent context window but share a task list and a messaging system, so a discovery by one teammate can inform the others.</li>\n<li>Division of labour. The lead assigns work across teammates and assembles the result.</li>\n</ul>\n<h2>Limitations</h2>\n<p>The feature is experimental, and the documentation lists concrete limitations to plan around:</p>\n<ul>\n<li>One team at a time. You can run a single team per session.</li>\n<li>No nested teams. Teammates cannot spawn their own teams or teammates.</li>\n<li>No session resumption with in-process teammates. Resuming or rewinding a session does not restore teammates that were running in that session.</li>\n<li>Operational rough edges. Task status can lag behind the actual state, shutting a team down can be slow, the lead role is fixed once set, permissions are set when a teammate is spawned, and split-pane views require tmux or iTerm2.</li>\n</ul>\n<p>Teams also add cost. Each teammate consumes tokens and runs as its own session, so the total spend is higher than a single session. For simple or linear tasks, subagents or one session are usually enough.</p>\n<h2>When to use agent teams</h2>\n<p>Use agent teams when work can be naturally divided and you want to reduce turnaround time. Good examples include investigating different components of a bug at once, developing a feature that spans backend, frontend and infrastructure, reviewing a large change by splitting files across teammates, and running competing design hypotheses in parallel. If your task is sequential, isolated to a single file or small in scope, a single session or subagents will usually be more efficient. Whatever you choose, monitor each teammate&#39;s output and integrate the work carefully to avoid conflicts.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "ai",
        "tooling",
        "claude-code",
        "teams"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/codex-for-software-and-platform-teams",
      "url": "https://soulstack.co.uk/blog/codex-for-software-and-platform-teams",
      "title": "Codex for software and platform teams",
      "summary": "Codex is OpenAI's coding agent that runs on your machine. It is open source, reads your repository, makes changes and runs commands based on natural language prompts. The current…",
      "content_html": "<p>Codex is OpenAI&#39;s coding agent that runs on your machine. It is open source, reads your repository, makes changes and runs commands based on natural language prompts. The current CLI is designed for agentic coding: it plans multi-file changes, executes commands and verifies results. This post covers installation, interactive and headless workflows, the memory feature and considerations for choosing Codex.</p>\n<h2>What Codex is</h2>\n<p>Codex CLI launches an interactive terminal session where you can talk to the agent about your code. You can ask it to implement features, refactor modules, write tests or explain unfamiliar logic. It understands project structure, coordinates changes across multiple files, and can run shell commands such as builds, tests and linters, then react to the output and iterate. The agent runs locally, so your code stays in your environment unless you choose to share context. If you do not specify a model, Codex uses a recommended default. OpenAI currently recommends starting with its flagship model for most coding tasks, and it is available whether you sign in with a ChatGPT account or use API key authentication.</p>\n<h2>Installation</h2>\n<p>Codex is distributed via npm. Install it globally:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">npm</span><span style=\"color:#A5D6FF\"> install</span><span style=\"color:#79C0FF\"> -g</span><span style=\"color:#A5D6FF\"> @openai/codex</span></span></code></pre></div><p>A Homebrew cask (brew install --cask codex) and an official install script are also available. After installing, run codex and sign in when prompted (you can also authenticate explicitly with codex login). You can sign in with your ChatGPT account, which gives access to the latest models, or supply an API key for CI automation. Configuration lives in ~/.codex/config.toml, where you set keys such as model, model_reasoning_effort, approval_policy and sandbox_mode. Codex also looks for AGENTS.md files in its home directory, the project root and the working directory to load project-specific instructions.</p>\n<h2>Interactive workflow</h2>\n<p>Running codex launches the interactive session. You describe what you want, and Codex plans the steps, shows proposed edits, runs tests and iterates until the goal is met. You can control the agent with slash commands, change models, and resume earlier sessions. In the Codex app you can also use Worktrees, which run multiple independent tasks in parallel git worktrees so they do not interfere with each other; you can hand a thread off between local and worktree execution.</p>\n<h2>Headless workflows with codex exec</h2>\n<p>For automation and CI, use the headless codex exec command (alias codex e). You pass a prompt and Codex runs the task and prints its output to stdout. Capture just the final message with --output-last-message, and emit newline-delimited JSON events with --json when you need to parse progress. You can integrate it into CI pipelines, git hooks or cron jobs. A simple one-shot looks like this, with a Python wrapper for scripts that already run in Python:</p>\n<div class=\"code-group\" data-code-group><div class=\"code-group__tabs\" role=\"tablist\"><button class=\"code-group__tab is-active\" type=\"button\" role=\"tab\" aria-selected=\"true\" data-cg-tab=\"0\">Bash</button><button class=\"code-group__tab\" type=\"button\" role=\"tab\" aria-selected=\"false\" data-cg-tab=\"1\">Python</button></div><div class=\"code-group__panel is-active\" role=\"tabpanel\" data-cg-panel=\"0\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">codex</span><span style=\"color:#A5D6FF\"> exec</span><span style=\"color:#A5D6FF\"> \"Update the version number in package.json to 2.1.0\"</span></span></code></pre></div><div class=\"code-group__panel\" role=\"tabpanel\" data-cg-panel=\"1\" hidden><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FF7B72\">import</span><span style=\"color:#E6EDF3\"> subprocess</span></span>\n<span class=\"line\"><span style=\"color:#E6EDF3\">subprocess.run([</span><span style=\"color:#A5D6FF\">\"codex\"</span><span style=\"color:#E6EDF3\">, </span><span style=\"color:#A5D6FF\">\"exec\"</span><span style=\"color:#E6EDF3\">, </span><span style=\"color:#A5D6FF\">\"Update the version number in package.json to 2.1.0\"</span><span style=\"color:#E6EDF3\">], </span><span style=\"color:#FFA657\">check</span><span style=\"color:#FF7B72\">=</span><span style=\"color:#79C0FF\">True</span><span style=\"color:#E6EDF3\">)</span></span></code></pre></div></div><p>On GitHub Actions, the official Codex GitHub Action is the recommended path: it installs and authenticates Codex for you, with the API key supplied from a repository secret rather than exported as a job-level variable. The following is a GitHub Actions workflow file, not a shell command:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#7EE787\">jobs</span><span style=\"color:#E6EDF3\">:</span></span>\n<span class=\"line\"><span style=\"color:#7EE787\">  update_changelog</span><span style=\"color:#E6EDF3\">:</span></span>\n<span class=\"line\"><span style=\"color:#7EE787\">    runs-on</span><span style=\"color:#E6EDF3\">: </span><span style=\"color:#A5D6FF\">ubuntu-latest</span></span>\n<span class=\"line\"><span style=\"color:#7EE787\">    steps</span><span style=\"color:#E6EDF3\">:</span></span>\n<span class=\"line\"><span style=\"color:#E6EDF3\">      - </span><span style=\"color:#7EE787\">uses</span><span style=\"color:#E6EDF3\">: </span><span style=\"color:#A5D6FF\">actions/checkout@v5</span></span>\n<span class=\"line\"><span style=\"color:#E6EDF3\">      - </span><span style=\"color:#7EE787\">uses</span><span style=\"color:#E6EDF3\">: </span><span style=\"color:#A5D6FF\">openai/codex-action@v1</span></span>\n<span class=\"line\"><span style=\"color:#7EE787\">        with</span><span style=\"color:#E6EDF3\">:</span></span>\n<span class=\"line\"><span style=\"color:#7EE787\">          openai-api-key</span><span style=\"color:#E6EDF3\">: </span><span style=\"color:#A5D6FF\">${{ secrets.OPENAI_API_KEY }}</span></span>\n<span class=\"line\"><span style=\"color:#7EE787\">          prompt</span><span style=\"color:#E6EDF3\">: </span><span style=\"color:#A5D6FF\">\"Update CHANGELOG for the next release based on commits since the last tag\"</span></span></code></pre></div><p>Control autonomy with --sandbox (read-only, workspace-write or danger-full-access) and --ask-for-approval (untrusted, on-request or never). For batch work you can loop over files and run codex exec once per file.</p>\n<h2>Memory</h2>\n<p>Codex has an opt-in memory feature that stores generated facts, preferences and project context across sessions. Memories live under ~/.codex/memories/, and you control them with the /memories command in the Codex app and TUI. The feature is off by default; enable it by setting memories = true under the [features] table in config.toml. Memory works best for personal conventions such as &quot;use pnpm instead of npm&quot; or &quot;the auth module is in src/core/auth/&quot;. For team-wide instructions, use AGENTS.md, which Codex loads automatically, rather than memory.</p>\n<h2>Strengths</h2>\n<ul>\n<li>Local first. Codex runs on your machine and modifies your code directly, so you keep privacy and low latency.</li>\n<li>Flexible workflows. You can work in the interactive TUI or run tasks headless with codex exec, including JSON output for automation.</li>\n<li>Standing instructions. AGENTS.md and the opt-in memory feature let you encode conventions so you do not repeat yourself.</li>\n<li>Parallel work. The Worktrees feature lets Codex run independent tasks in isolated git worktrees.</li>\n</ul>\n<h2>Limits</h2>\n<p>Codex is still evolving, and some features may change. The CLI requires Node.js for the npm install, and running with full access can modify files or run commands you did not expect, so review diffs or use approval policies. Memory and AGENTS.md live locally and are not shared across machines unless you sync them yourself. While Codex is open source, using it with hosted models incurs subscription or API costs.</p>\n<h2>When to choose Codex</h2>\n<p>Choose Codex when you want a local coding agent that can run offline, integrate into CI/CD and be scripted. It suits teams that value reproducible automation and headless runs. If you rely heavily on GitHub services or want built-in cloud integration, GitHub Copilot may fit better. For an approve-before-edit plan step and rich permission modes, Claude Code is worth a look. Codex excels at flexible scripting and local control.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "ai",
        "tooling",
        "codex",
        "developer-workflow"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/github-copilot-for-software-and-platform-teams",
      "url": "https://soulstack.co.uk/blog/github-copilot-for-software-and-platform-teams",
      "title": "GitHub Copilot for software and platform teams",
      "summary": "GitHub Copilot is an AI coding assistant that integrates with editors, the terminal and the GitHub website. It generates code suggestions in real time, answers questions about you…",
      "content_html": "<p>GitHub Copilot is an AI coding assistant that integrates with editors, the terminal and the GitHub website. It generates code suggestions in real time, answers questions about your repository and can plan and execute multi-step tasks. Copilot is more than a prompt generator: its agent mode can analyse your codebase, propose changes, run commands and iterate until tests pass. This post explains how to set up Copilot, how its workflow differs from other tools, its memory feature, strengths and limits, and when to pick it.</p>\n<h2>What Copilot is</h2>\n<p>At its core, Copilot provides inline code suggestions in your editor and a chat interface. In the editor it predicts the next lines of code or entire functions from the surrounding context. Chat can answer questions about code, suggest fixes and generate documentation. Copilot also has a command-line interface, a code review feature that suggests improvements on your changes, and pull request summaries that describe what a change does. Agent mode is the most capable surface: it interprets a high-level goal, plans a sequence of steps, runs commands and tests, and refines its work over several iterations.</p>\n<h2>Setup and access</h2>\n<p>Copilot is available to subscribers on certain personal and enterprise plans, so check the GitHub documentation for current availability. To get started, install the Copilot extension in Visual Studio Code or JetBrains IDEs, or enable it on GitHub. For the terminal, install the Copilot CLI, which is generally available and adds a copilot command; you install it with npm (npm install -g @github/copilot, which needs a recent Node.js release, currently Node.js 22 or later; check the install docs for the current minimum), then launch it by typing copilot and authenticating with /login. Copilot Memory is enabled per user rather than per repository, and repository owners can review and delete the repository-level facts stored for their repository.</p>\n<h2>Core workflow</h2>\n<p>In typical use, Copilot runs in your editor and offers suggestions as you type. You accept a suggestion with the tab key or ask chat for clarification or alternatives. You can ask Copilot to write unit tests, refactor functions or explain code. In the terminal, the copilot command provides an agentic chat that can run commands, edit files and interact with GitHub.com.</p>\n<p>Agent mode changes the workflow. Instead of producing a single suggestion, Copilot plans an end-to-end solution. It reads your repository, understands dependencies, proposes a plan and then executes by editing files and running commands, looping through planning, applying changes, running tests and refining until the goal is met. It can detect errors in terminal output, test results and builds, and iterate to fix them. You can review each step and adjust the plan. The Model Context Protocol lets you add custom tools and external resources that agent mode can call.</p>\n<h2>Memory</h2>\n<p>Copilot&#39;s agentic memory stores repository-level facts and user-level preferences. Repository facts cover conventions, build commands and cross-file dependencies, and are available to anyone with access to Copilot Memory in that repository. User preferences capture personal coding style and are visible only to that user, though on Business and Enterprise plans an organisation or enterprise administrator can export or delete them. Memory is shared across the Copilot coding agent, code review and the CLI, so a fact captured by one can be used by another. Unused facts and preferences are deleted automatically after a period of disuse (currently 28 days). Repository-level facts are scoped to the repository they came from, while user preferences follow the user across their interactions.</p>\n<h2>Strengths</h2>\n<ul>\n<li>Context-aware suggestions. Copilot reads multiple files, learns your style and suggests idiomatic code, and its memory keeps suggestions consistent across sessions.</li>\n<li>Integrated chat and review. Chat answers questions, while code review suggests improvements and pull request summaries describe what changed, saving time during review cycles.</li>\n<li>Agentic automation. For tasks like migrating frameworks, adding tests or refactoring large modules, agent mode handles the repetitive steps, runs tests and iterates.</li>\n<li>Cross-surface availability. Copilot works in editors, the terminal and on GitHub, so you can use it where you work.</li>\n</ul>\n<h2>Limitations</h2>\n<p>Copilot is not a replacement for engineering judgement. It can generate incorrect or insecure code and relies on tests and human review to catch mistakes. Agent mode increases autonomy but still has limits and works best with clear goals and good tests. Repository-level memory is scoped to a single repository and unused entries expire automatically after a period of disuse, so it does not carry knowledge between unrelated projects on its own. Finally, Copilot is a paid product whose access is controlled by GitHub and may not be available in all regions.</p>\n<h2>When to choose Copilot</h2>\n<p>Use Copilot when you want an AI assistant deeply integrated with GitHub and your editor. It excels at routine coding, boilerplate generation and code review. Agent mode is valuable when you have well-defined goals and tests that the agent can run repeatedly, such as migrating a codebase to a new framework or adding comprehensive tests. Because its memory is repository-scoped for facts and its behaviour is governed by your approval settings, it fits teams that want automation with clear guardrails. It is less suitable where strict local control is required or where open source tools are preferred.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "ai",
        "tooling",
        "github-copilot",
        "developer-workflow"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/kiro-for-spec-driven-software-and-platform-work",
      "url": "https://soulstack.co.uk/blog/kiro-for-spec-driven-software-and-platform-work",
      "title": "Kiro for spec driven software and platform work",
      "summary": "Kiro is a development environment that combines a code editor with AI agents and a formal specification workflow. Its goal is to move beyond ad hoc prompting by asking teams to de…",
      "content_html": "<p>Kiro is a development environment that combines a code editor with AI agents and a formal specification workflow. Its goal is to move beyond ad hoc prompting by asking teams to define what they want in a structured way and then letting the agent implement that plan. Kiro is based on Code OSS, so it feels similar to Visual Studio Code, and it suits platform teams that value repeatable processes and clear specifications.</p>\n<h2>What Kiro is</h2>\n<p>Kiro converts a request into structured requirements using EARS (Easy Approach to Requirements Syntax) notation, then builds a design and a task list before writing code. EARS expresses acceptance criteria in a testable form such as WHEN a condition THE SYSTEM SHALL a behaviour. The tool keeps natural language interaction at the front while still requiring the rigour of written specifications, and it can run independent tasks in parallel to speed up execution.</p>\n<h2>Setup</h2>\n<p>Kiro is distributed as an IDE. You download it from kiro.dev and install it like any other editor. Because it is based on Code OSS you can import your VS Code settings, themes and Open VSX compatible plugins. Kiro runs on Windows, macOS and Linux. You sign in with one of several options, including GitHub, Google, an AWS Builder ID or AWS IAM Identity Center. AWS integration is optional and you do not need an AWS account to use Kiro.</p>\n<h2>Core workflow</h2>\n<p>Kiro&#39;s workflow is explicitly spec driven. Every spec generates three files. You start with a requirements document (requirements.md, or bugfix.md for a bug fix) that describes the change as numbered user stories and EARS acceptance criteria. Next the agent produces a design document (design.md) that explains the architecture, data flow and key functions. Finally Kiro produces a tasks document (tasks.md) that breaks the design into concrete tasks. Code changes happen only after these documents exist. Tasks run sequentially or in parallel depending on their dependencies, and Kiro can run all independent tasks concurrently in waves.</p>\n<p>For each task the agent proposes code changes and shows diffs, runs tests and updates the task status. Because the work is defined in tasks.md, you always know what the agent will do next. Bugfix specs follow the same three-phase structure, capturing current behaviour, expected behaviour and what must stay unchanged.</p>\n<h2>Steering and memory</h2>\n<p>Kiro gives the agent persistent context through steering files. These markdown files live in .kiro/steering/ for a specific workspace or ~/.kiro/steering/ for global settings. Three foundational files are included by default: product.md defines purpose, users and goals, tech.md records frameworks, libraries and technical constraints, and structure.md describes file organisation, naming conventions and architectural decisions. Steering files support inclusion modes: always (the default), fileMatch for conditional loading, manual for on-demand inclusion, and auto. You can add your own steering files for API standards, testing approaches or deployment workflows. Kiro also recognises AGENTS.md, which is always included and does not use inclusion modes.</p>\n<h2>Strengths</h2>\n<ul>\n<li>Structured development. By making you articulate requirements, design and tasks before implementation, Kiro reduces ambiguity and encourages thoughtful design, and the specs act as living documentation.</li>\n<li>Persistent context. Steering files give the agent memory of your conventions, so it does not need reminding of preferred frameworks, folder structures or naming patterns.</li>\n<li>Parallel execution. Kiro runs independent tasks concurrently, which accelerates long-running changes across a large codebase.</li>\n</ul>\n<h2>Limits</h2>\n<p>Kiro assumes a formal specification exists for every change. For one-line fixes or exploratory coding this overhead can feel heavy. EARS notation and the three-document structure add a learning cost, and tasks need to be kept in step with the spec. Because Kiro emphasises a controlled workflow, it is less suited to rapid prototyping. Kiro is also proprietary: although it builds on open source components, it requires an account.</p>\n<h2>When to choose Kiro</h2>\n<p>Choose Kiro when you want the discipline of spec driven development. It is especially useful for platform teams building internal services or infrastructure that need traceability and strong conventions. The combination of formal requirements, design documentation and task orchestration helps align engineers and maintain quality across a large codebase. For small projects or quick experiments, a lighter tool may be more appropriate.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "ai",
        "tooling",
        "kiro",
        "spec-driven-development"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/lightweight-scripted-ai-workflows-for-engineering-teams",
      "url": "https://soulstack.co.uk/blog/lightweight-scripted-ai-workflows-for-engineering-teams",
      "title": "Lightweight scripted AI workflows for engineering teams",
      "summary": "AI agents are often presented as interactive assistants, but they can also be invoked programmatically. When you have repeatable tasks such as updating version numbers, generating…",
      "content_html": "<p>AI agents are often presented as interactive assistants, but they can also be invoked programmatically. When you have repeatable tasks such as updating version numbers, generating changelogs or reviewing diffs, you can script those actions using the headless modes of modern coding agents. This post focuses on the Codex CLI command codex exec, which runs the agent non-interactively. Similar patterns apply to other tools with scripting support.</p>\n<h2>Why script AI workflows</h2>\n<p>Interactive sessions are great for exploring a new problem, but automation shines when tasks repeat. In CI pipelines, pre-commit hooks or cron jobs you want predictable behaviour: the agent reads the repository, performs a task and exits with a clear result. Scripting reduces manual effort, ensures consistency and lets you integrate AI into existing automation. Because the agent runs headless, it can be part of your build system without requiring a human at the keyboard.</p>\n<h2>Running Codex in headless mode</h2>\n<p>Codex CLI offers codex exec (alias codex e), a command that runs Codex non-interactively for scripted and CI use. You pass a prompt, and Codex reads your repository, plans the changes, executes them and exits. By default it streams progress to stderr and prints only the final agent message to stdout. You can write that final message to a file with the --output-last-message flag, and emit newline-delimited JSON events with the --json flag when you want to parse progress programmatically.</p>\n<h3>Basic one-shot commands</h3>\n<p>You can call codex exec directly from a script. For example, to update a version number in a project file:</p>\n<div class=\"code-group\" data-code-group><div class=\"code-group__tabs\" role=\"tablist\"><button class=\"code-group__tab is-active\" type=\"button\" role=\"tab\" aria-selected=\"true\" data-cg-tab=\"0\">Bash</button><button class=\"code-group__tab\" type=\"button\" role=\"tab\" aria-selected=\"false\" data-cg-tab=\"1\">Python</button></div><div class=\"code-group__panel is-active\" role=\"tabpanel\" data-cg-panel=\"0\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">codex</span><span style=\"color:#A5D6FF\"> exec</span><span style=\"color:#A5D6FF\"> \"Update the version number in package.json to 2.1.0\"</span></span></code></pre></div><div class=\"code-group__panel\" role=\"tabpanel\" data-cg-panel=\"1\" hidden><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FF7B72\">import</span><span style=\"color:#E6EDF3\"> subprocess</span></span>\n<span class=\"line\"><span style=\"color:#E6EDF3\">subprocess.run([</span><span style=\"color:#A5D6FF\">\"codex\"</span><span style=\"color:#E6EDF3\">, </span><span style=\"color:#A5D6FF\">\"exec\"</span><span style=\"color:#E6EDF3\">, </span><span style=\"color:#A5D6FF\">\"Update the version number in package.json to 2.1.0\"</span><span style=\"color:#E6EDF3\">], </span><span style=\"color:#FFA657\">check</span><span style=\"color:#FF7B72\">=</span><span style=\"color:#79C0FF\">True</span><span style=\"color:#E6EDF3\">)</span></span></code></pre></div></div><p>Codex parses the prompt, modifies package.json, runs any necessary commands and exits. Because the final message is printed to stdout, you can capture or log the result.</p>\n<h3>Controlling autonomy</h3>\n<p>Headless runs respect the same sandbox and approval controls as interactive sessions. Set the file system access level with --sandbox, which accepts read-only, workspace-write or danger-full-access. Set the approval behaviour with --ask-for-approval, which accepts untrusted, on-request or never. For a guarded run you can combine a writable workspace with on-request approvals (the official guidance prefers never for fully unattended, non-interactive runs):</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">codex</span><span style=\"color:#A5D6FF\"> exec</span><span style=\"color:#79C0FF\"> --sandbox</span><span style=\"color:#A5D6FF\"> workspace-write</span><span style=\"color:#79C0FF\"> --ask-for-approval</span><span style=\"color:#A5D6FF\"> on-request</span><span style=\"color:#A5D6FF\"> \"Add a unit test for the parse_config function\"</span></span></code></pre></div><p>You can also set defaults in ~/.codex/config.toml using the model, model_reasoning_effort, approval_policy and sandbox_mode keys, so individual scripts stay short.</p>\n<h3>Integrating with CI/CD</h3>\n<p>codex exec fits naturally into CI pipelines. On GitHub Actions, the official Codex GitHub Action is the recommended path: it installs and authenticates Codex for you, with the API key supplied from a repository secret rather than exported as a job-level variable. Note that this is a YAML workflow file, not a shell command:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#7EE787\">jobs</span><span style=\"color:#E6EDF3\">:</span></span>\n<span class=\"line\"><span style=\"color:#7EE787\">  update_changelog</span><span style=\"color:#E6EDF3\">:</span></span>\n<span class=\"line\"><span style=\"color:#7EE787\">    runs-on</span><span style=\"color:#E6EDF3\">: </span><span style=\"color:#A5D6FF\">ubuntu-latest</span></span>\n<span class=\"line\"><span style=\"color:#7EE787\">    steps</span><span style=\"color:#E6EDF3\">:</span></span>\n<span class=\"line\"><span style=\"color:#E6EDF3\">      - </span><span style=\"color:#7EE787\">uses</span><span style=\"color:#E6EDF3\">: </span><span style=\"color:#A5D6FF\">actions/checkout@v5</span></span>\n<span class=\"line\"><span style=\"color:#E6EDF3\">      - </span><span style=\"color:#7EE787\">uses</span><span style=\"color:#E6EDF3\">: </span><span style=\"color:#A5D6FF\">openai/codex-action@v1</span></span>\n<span class=\"line\"><span style=\"color:#7EE787\">        with</span><span style=\"color:#E6EDF3\">:</span></span>\n<span class=\"line\"><span style=\"color:#7EE787\">          openai-api-key</span><span style=\"color:#E6EDF3\">: </span><span style=\"color:#A5D6FF\">${{ secrets.OPENAI_API_KEY }}</span></span>\n<span class=\"line\"><span style=\"color:#7EE787\">          prompt</span><span style=\"color:#E6EDF3\">: </span><span style=\"color:#A5D6FF\">\"Update CHANGELOG for the next release based on commits since the last tag\"</span></span></code></pre></div><p>For other CI systems such as GitLab CI/CD or Jenkins, install the CLI with npm install -g @openai/codex and scope the API key to the single invocation rather than exporting it for the whole job.</p>\n<h3>Batch processing</h3>\n<p>You can combine shell loops with codex exec to perform the same action on multiple files. For example, adding type hints to every Python file in a directory:</p>\n<div class=\"code-group\" data-code-group><div class=\"code-group__tabs\" role=\"tablist\"><button class=\"code-group__tab is-active\" type=\"button\" role=\"tab\" aria-selected=\"true\" data-cg-tab=\"0\">Bash</button><button class=\"code-group__tab\" type=\"button\" role=\"tab\" aria-selected=\"false\" data-cg-tab=\"1\">Python</button><button class=\"code-group__tab\" type=\"button\" role=\"tab\" aria-selected=\"false\" data-cg-tab=\"2\">PowerShell</button></div><div class=\"code-group__panel is-active\" role=\"tabpanel\" data-cg-panel=\"0\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FF7B72\">for</span><span style=\"color:#E6EDF3\"> file </span><span style=\"color:#FF7B72\">in</span><span style=\"color:#A5D6FF\"> src/utils/*.py</span><span style=\"color:#E6EDF3\">; </span><span style=\"color:#FF7B72\">do</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">  codex</span><span style=\"color:#A5D6FF\"> exec</span><span style=\"color:#A5D6FF\"> \"Add type hints to all functions in </span><span style=\"color:#E6EDF3\">$file</span><span style=\"color:#A5D6FF\">\"</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">done</span></span></code></pre></div><div class=\"code-group__panel\" role=\"tabpanel\" data-cg-panel=\"1\" hidden><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FF7B72\">import</span><span style=\"color:#E6EDF3\"> glob</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">import</span><span style=\"color:#E6EDF3\"> subprocess</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">for</span><span style=\"color:#FFA657\"> file</span><span style=\"color:#FF7B72\"> in</span><span style=\"color:#E6EDF3\"> glob.glob(</span><span style=\"color:#A5D6FF\">\"src/utils/*.py\"</span><span style=\"color:#E6EDF3\">):</span></span>\n<span class=\"line\"><span style=\"color:#E6EDF3\">    subprocess.run([</span><span style=\"color:#A5D6FF\">\"codex\"</span><span style=\"color:#E6EDF3\">, </span><span style=\"color:#A5D6FF\">\"exec\"</span><span style=\"color:#E6EDF3\">, </span><span style=\"color:#FF7B72\">f</span><span style=\"color:#A5D6FF\">\"Add type hints to all functions in </span><span style=\"color:#FF7B72\">{</span><span style=\"color:#FFA657\">file</span><span style=\"color:#FF7B72\">}</span><span style=\"color:#A5D6FF\">\"</span><span style=\"color:#E6EDF3\">], </span><span style=\"color:#FFA657\">check</span><span style=\"color:#FF7B72\">=</span><span style=\"color:#79C0FF\">True</span><span style=\"color:#E6EDF3\">)</span></span></code></pre></div><div class=\"code-group__panel\" role=\"tabpanel\" data-cg-panel=\"2\" hidden><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span>Get-ChildItem -Path src/utils -Filter *.py | ForEach-Object {</span></span>\n<span class=\"line\"><span>    codex exec \"Add type hints to all functions in $($_.FullName)\"</span></span>\n<span class=\"line\"><span>}</span></span></code></pre></div></div><p>Each invocation of codex exec is isolated. Codex reads the specified file, applies the transformation and stops. For idempotent tasks this pattern is efficient and easy to reason about.</p>\n<h2>Tips for scripted workflows</h2>\n<ol>\n<li>Define completion criteria. Write prompts that describe both the action and what counts as success, for example &quot;Refactor this module to remove global variables and make sure the existing tests still pass.&quot;</li>\n<li>Review diffs. Even in headless mode, run a diff or a separate review step after the agent executes to confirm that the changes meet your standards.</li>\n<li>Use AGENTS.md for standing instructions. Codex loads AGENTS.md from ~/.codex, the project root and each directory down to the working directory, so your scripts inherit conventions without repeating them. Closer files override more distant ones.</li>\n<li>Combine with other tools. codex exec can sit inside a larger pipeline that includes linting, static analysis and deployment. Use exit codes to handle errors gracefully.</li>\n</ol>\n<h2>Conclusion</h2>\n<p>Lightweight scripting turns AI coding agents into practical automation tools. The headless codex exec command makes it easy to embed AI assistance into CI pipelines, cron jobs and simple scripts. By writing clear prompts, defining completion criteria and reviewing outputs, you can use AI to perform mundane tasks consistently and free your team to focus on higher value work.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "ai",
        "automation",
        "scripting",
        "platform-engineering"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/memory-management-for-ai-coding-and-platform-agents",
      "url": "https://soulstack.co.uk/blog/memory-management-for-ai-coding-and-platform-agents",
      "title": "Memory management for AI coding and platform agents",
      "summary": "AI coding agents deliver better results when they remember facts about your project and your preferences. Without memory you would have to remind the agent about naming convention…",
      "content_html": "<p>AI coding agents deliver better results when they remember facts about your project and your preferences. Without memory you would have to remind the agent about naming conventions, build commands and architectural patterns in every session. Different tools implement memory in different ways. This post compares the memory systems in GitHub Copilot, Claude Code, Kiro and Codex and offers guidance on creating and managing them.</p>\n<h2>Why memory matters</h2>\n<p>Memory lets agents persist context across interactions. When an agent knows your conventions, preferred tools and design decisions, it can generate more consistent and accurate suggestions. It also enables cross-feature sharing: in GitHub Copilot, facts and preferences captured by one feature can be used by another. The result is fewer repeated prompts and less need for long custom instructions.</p>\n<h2>GitHub Copilot Memory</h2>\n<p>GitHub Copilot includes an agentic memory feature that stores repository-level facts and user-level preferences. Repository facts cover conventions, build commands and cross-file dependencies, and are available to any user with access to Copilot Memory in that repository. Such facts are only created in response to actions by users with write access who have memory enabled. User-level preferences capture personal coding style and are visible only to that user&#39;s Copilot interactions, although on Business and Enterprise plans an organisation or enterprise administrator can export or delete them.</p>\n<p>Copilot memory is currently used by the Copilot coding agent, Copilot code review and the Copilot CLI, and facts captured by one of these can be reused by another. Any stored fact or preference that goes unused is automatically deleted after a period of disuse (28 days at the time of writing), and the timer can reset when Copilot validates and uses an entry. Note the scoping: repository-level facts can only be used in operations on the same repository, while user-level preferences follow the user across their interactions rather than being tied to one repository. Repository owners can review and manually delete the repository-level facts stored for their repository.</p>\n<h2>Claude Code memory</h2>\n<p>Claude Code uses two mechanisms: CLAUDE.md files and auto memory. CLAUDE.md is a markdown file you write yourself that contains instructions and context for the agent. You can scope it at the project level (committed to your repository), the user level (~/.claude/CLAUDE.md) or the organisation level through a managed policy file. CLAUDE.md files are loaded in full regardless of length, though Anthropic recommends targeting under 200 lines per file because longer files reduce adherence.</p>\n<p>Auto memory is a separate mechanism. Claude maintains a MEMORY.md entrypoint under ~/.claude/projects/<project>/memory/ and writes notes there as you work. The first 200 lines of MEMORY.md, or the first 25 KB, whichever comes first, are loaded at the start of every conversation. This load limit applies only to MEMORY.md, not to CLAUDE.md. Auto memory requires a recent version of Claude Code and is on by default; you can manage it with the /memory command. Write CLAUDE.md as clear imperative instructions, such as &quot;Always use pytest&quot; or &quot;Do not use synchronous file operations&quot;, and include the reason where it helps.</p>\n<h2>Kiro steering files</h2>\n<p>Kiro persists context through steering files. These markdown files live in .kiro/steering/ for workspace-specific guidance or ~/.kiro/steering/ for global conventions. Three foundational files are included: product.md describes the purpose, users and goals, tech.md records frameworks, libraries and technical constraints, and structure.md describes file organisation, naming conventions and architectural decisions. Steering files support inclusion modes: always (the default), fileMatch for conditional loading, manual for on-demand inclusion, and auto. Kiro also recognises AGENTS.md, which is always included and does not use inclusion modes.</p>\n<p>To manage steering files well, keep each one focused on a single domain, such as api-standards.md or testing-standards.md, and explain why decisions were made. Review them periodically and treat changes the way you treat code changes, through review and CI checks.</p>\n<h2>Codex memory</h2>\n<p>Codex has a memory feature that stores generated facts, preferences and project context across sessions. Memories are stored under ~/.codex/memories/ and you control them with the /memories command in the Codex app and TUI. The feature is generated automatically rather than added through manual commands; it is off by default and you enable it by setting memories = true under the [features] table in config.toml. At launch it is not available in the UK, the EEA or Switzerland. For team-wide instructions, use AGENTS.md rather than memory, and reserve memory for personal preferences such as which package manager to use or where a module lives.</p>\n<h2>Custom memory management</h2>\n<p>When customising memory for any agent:</p>\n<ol>\n<li>Scope appropriately. Use global files for preferences that apply everywhere and project-specific files for repository rules. In Kiro that means separating .kiro/steering/ from ~/.kiro/steering/. In Claude Code, use project and user CLAUDE.md. In Codex, use memory for personal preferences and AGENTS.md for team policies.</li>\n<li>Write clear, actionable instructions. Agents follow explicit commands better than prose. State what to do (&quot;Use pytest, not unittest&quot;) and what to avoid.</li>\n<li>Review and prune regularly. Memory can grow stale. Copilot memories expire after a period of disuse. For Claude Code and Kiro, schedule periodic reviews to update instructions and delete obsolete guidance.</li>\n<li>Protect sensitive data. Never put secrets or credentials in memory files. Copilot memory, CLAUDE.md, steering files and Codex memories all live on disk, so treat them as part of your codebase or dotfiles.</li>\n</ol>\n<h2>Conclusion</h2>\n<p>Persistent context is a core part of modern AI coding agents. Copilot stores repository facts and user preferences, Claude Code uses CLAUDE.md and auto memory, Kiro relies on steering files and Codex offers an opt-in memory feature for personal conventions. By understanding and managing these mechanisms, you can make your agents more consistent and productive while keeping your workflows under control.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "ai",
        "agents",
        "memory",
        "developer-workflow"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/sql-indexing-when-it-helps-and-when-it-hurts",
      "url": "https://soulstack.co.uk/blog/sql-indexing-when-it-helps-and-when-it-hurts",
      "title": "SQL indexing: when it helps and when it hurts",
      "summary": "An index is a trade. It makes some reads dramatically faster in exchange for slower writes, more storage, and a planner that now has more choices to get wrong. Treated as a defaul…",
      "content_html": "<p>An index is a trade. It makes some reads dramatically faster in exchange for slower writes, more storage, and a planner that now has more choices to get wrong. Treated as a default reflex (&quot;the query is slow, add an index&quot;), it often disappoints. Treated as a deliberate decision based on how a column is queried, it is one of the highest leverage tools you have. This post covers how B-tree indexes work, when they earn their keep, and the common cases where they do nothing or actively cost you, with the PostgreSQL and MySQL specifics that matter.</p>\n<h2>How a B-tree index works</h2>\n<p>A B-tree keeps its keys sorted and balanced, so the database can find a value, or the start of a range, in a small number of page reads rather than scanning the whole table. In PostgreSQL, CREATE INDEX builds a B-tree by default because it suits the most common cases, and the planner will consider it for the ordering and comparison operators (less than, less than or equal, equal, greater than or equal, greater than) as well as BETWEEN, IN, and IS NULL. In MySQL, B-tree is the structure InnoDB uses for its indexes, kept sorted to allow fast equality and range lookups.</p>\n<p>One InnoDB detail is worth holding in mind because it explains a lot of later behaviour: every InnoDB table has a clustered index that stores the row data itself, usually the primary key. Every other index is a secondary index, and each of its entries also carries the primary key columns. That is why InnoDB secondary indexes can answer more queries from the index alone than you might expect.</p>\n<h2>When an index helps</h2>\n<h3>Selectivity decides everything</h3>\n<p>An index pays off when a query touches a small fraction of the table. Selecting a thousand rows out of a hundred thousand is a good candidate. Selecting one row out of a hundred is usually not, because those hundred rows probably fit in a single page and no plan can beat reading one page sequentially. On a table that occupies a single page, you will nearly always get a sequential scan whether an index exists or not, and the planner is right to do so. This is why an index on a low cardinality column (a boolean, a status with three values) is frequently ignored: most values are not selective enough to be worth the indirection.</p>\n<p>The practical lesson is to index columns you filter or join on with high selectivity, and to stop expecting an index to help when the predicate matches a large share of rows.</p>\n<h3>Covering indexes and index-only scans</h3>\n<p>If an index contains every column a query needs, the database can answer from the index without visiting the table at all. PostgreSQL calls this an index-only scan, and you can deliberately build a covering index by adding non-key columns with INCLUDE, so they ride along as payload without being part of the search key.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FF7B72\">CREATE</span><span style=\"color:#FF7B72\"> INDEX</span><span style=\"color:#D2A8FF\"> orders_customer_idx</span><span style=\"color:#FF7B72\"> ON</span><span style=\"color:#E6EDF3\"> orders (customer_id) </span><span style=\"color:#FF7B72\">INCLUDE</span><span style=\"color:#E6EDF3\"> (</span><span style=\"color:#FF7B72\">status</span><span style=\"color:#E6EDF3\">, total);</span></span></code></pre></div><p>MySQL expresses the same idea as a covering index, but there is no INCLUDE clause: you list the columns in the index itself, and InnoDB&#39;s habit of appending the primary key to secondary indexes means more of your queries are covered than the index definition alone suggests.</p>\n<h3>Composite indexes and column order</h3>\n<p>A multicolumn index is ordered by its first column, then its second within that, and so on, so column order is not cosmetic. The two engines differ in how forgiving they are. MySQL uses a strict leftmost prefix rule: an index on (col1, col2, col3) supports lookups on (col1), (col1, col2), and (col1, col2, col3), but not on col2 alone, because that is not a leftmost prefix. PostgreSQL is more permissive, a multicolumn B-tree can be used with any subset of its columns, but it is most efficient when the leading columns are constrained, and the firm rule is that equality constraints on the leading columns plus the first inequality are what limit the scan. Either way, put the columns you filter on by equality first, and the range or sort column last.</p>\n<h2>When an index hurts or goes unused</h2>\n<h3>Write and storage cost</h3>\n<p>Once an index exists, the database keeps it synchronised with the table, which adds overhead to every insert, update, and delete, and it occupies storage. PostgreSQL states this plainly and advises that indexes which are seldom or never used should be removed. Every extra index you carry is paid for on every write, so a table with ten indexes is doing ten times the index maintenance per row change. Index for the reads you actually run, and drop the ones that no query uses.</p>\n<h3>Predicates that defeat an index</h3>\n<p>A column index only helps if the query lets the database use it. Several common patterns quietly prevent that:</p>\n<ul>\n<li>Applying a function or expression to the column. A filter like lower(email) = &#39;x&#39; cannot use a plain index on email. The expression itself has to be indexed: PostgreSQL supports indexes on expressions, and MySQL supports functional key parts (implemented as hidden generated columns) that the optimiser matches against.</li>\n<li>A leading wildcard in a pattern match. A B-tree can serve LIKE &#39;foo%&#39; because the pattern is anchored at the start, but not LIKE &#39;%bar&#39;, because there is no prefix to seek to.</li>\n<li>A type mismatch that forces a conversion. In MySQL, comparing an indexed string column to a number cannot use the index, because many different strings (&#39;1&#39;, &#39; 1&#39;, &#39;1a&#39;) convert to the same number, so the index ordering is no help.</li>\n</ul>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#8B949E\">-- Cannot use a plain index on email:</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">SELECT</span><span style=\"color:#FF7B72\"> *</span><span style=\"color:#FF7B72\"> FROM</span><span style=\"color:#E6EDF3\"> users </span><span style=\"color:#FF7B72\">WHERE</span><span style=\"color:#79C0FF\"> lower</span><span style=\"color:#E6EDF3\">(email) </span><span style=\"color:#FF7B72\">=</span><span style=\"color:#A5D6FF\"> 'a@example.com'</span><span style=\"color:#E6EDF3\">;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#8B949E\">-- Give the expression its own index (PostgreSQL):</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">CREATE</span><span style=\"color:#FF7B72\"> INDEX</span><span style=\"color:#D2A8FF\"> users_lower_email_idx</span><span style=\"color:#FF7B72\"> ON</span><span style=\"color:#E6EDF3\"> users (</span><span style=\"color:#79C0FF\">lower</span><span style=\"color:#E6EDF3\">(email));</span></span></code></pre></div><h2>Confirming what the planner does</h2>\n<p>Do not guess whether an index is used, ask. In PostgreSQL, EXPLAIN shows the chosen plan (an Index Scan or Index Only Scan versus a Seq Scan), and EXPLAIN ANALYZE actually runs the query and reports real row counts and timings. In MySQL, EXPLAIN reports the candidate indexes in possible_keys and the one actually chosen in key, and a key of NULL means no index was used and the access type ALL means a full table scan.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#E6EDF3\">EXPLAIN ANALYZE </span><span style=\"color:#FF7B72\">SELECT</span><span style=\"color:#FF7B72\"> *</span><span style=\"color:#FF7B72\"> FROM</span><span style=\"color:#E6EDF3\"> orders </span><span style=\"color:#FF7B72\">WHERE</span><span style=\"color:#E6EDF3\"> customer_id </span><span style=\"color:#FF7B72\">=</span><span style=\"color:#79C0FF\"> 42</span><span style=\"color:#E6EDF3\">;</span></span></code></pre></div><h2>Partial and expression indexes</h2>\n<p>Two PostgreSQL features let you index less and gain more. A partial index covers only the rows that satisfy a predicate, so an index on the small fraction of orders WHERE status = &#39;open&#39; is smaller and cheaper to maintain than one over every row. An expression index, as above, indexes a computed value rather than a stored column. Both narrow the index to exactly what your queries need, which is usually the point: a smaller, well targeted index beats a large general one.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FF7B72\">CREATE</span><span style=\"color:#FF7B72\"> INDEX</span><span style=\"color:#D2A8FF\"> orders_open_idx</span><span style=\"color:#FF7B72\"> ON</span><span style=\"color:#E6EDF3\"> orders (created_at) </span><span style=\"color:#FF7B72\">WHERE</span><span style=\"color:#FF7B72\"> status</span><span style=\"color:#FF7B72\"> =</span><span style=\"color:#A5D6FF\"> 'open'</span><span style=\"color:#E6EDF3\">;</span></span></code></pre></div><h2>Conclusion</h2>\n<p>Indexing is not about adding indexes, it is about matching them to how the data is queried. A B-tree on a selective column you filter or join on, ordered to match your predicates, and trimmed with a partial or covering definition, can turn a scan into a handful of page reads. The same index on a low selectivity column, or one defeated by a function, a leading wildcard, or a type mismatch, costs you on every write and returns nothing. Decide with EXPLAIN, not with hope, and remove the indexes that no query uses.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "databases",
        "sql",
        "performance",
        "postgresql"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/understanding-transactions-and-isolation-levels",
      "url": "https://soulstack.co.uk/blog/understanding-transactions-and-isolation-levels",
      "title": "Understanding transactions and isolation levels",
      "summary": "A transaction groups several statements into one unit that either fully happens or does not happen at all. Isolation levels decide how much one in flight transaction can see of an…",
      "content_html": "<p>A transaction groups several statements into one unit that either fully happens or does not happen at all. Isolation levels decide how much one in flight transaction can see of another&#39;s uncommitted or concurrent work. Get the level wrong and you either allow subtle data races or you serialise everything and lose throughput. The defaults are not the same across engines, which is the single most common source of surprise, so this post walks through the anomalies, the four standard levels, and how PostgreSQL and MySQL actually behave.</p>\n<h2>ACID in one paragraph</h2>\n<p>A transaction is described by four properties. Atomicity means all of its changes commit together or none do. Consistency means the database moves from one valid state to another, and a reader sees all old values or all new ones, never a mix. Isolation means concurrent transactions do not see each other&#39;s in progress work. Durability means that once a commit succeeds, the changes survive a crash or power loss. Isolation is the property with a dial on it, and that dial is the isolation level.</p>\n<h2>The read anomalies</h2>\n<p>The isolation levels are defined by which concurrency anomalies they permit.</p>\n<h3>Dirty read</h3>\n<p>A transaction reads a row that another transaction has written but not yet committed. If the writer rolls back, the reader acted on data that never existed.</p>\n<h3>Non-repeatable read</h3>\n<p>A transaction reads a row, a second transaction commits a change to that row, and when the first transaction reads it again it gets a different value. The same query, twice, two answers.</p>\n<h3>Phantom read</h3>\n<p>A transaction runs a query that matches a set of rows, another transaction commits a new row that also matches, and re-running the query now returns an extra row. The difference from a non-repeatable read is that whole rows appear or vanish rather than existing values changing.</p>\n<h2>The four isolation levels</h2>\n<p>The SQL standard defines four levels in terms of which anomalies they allow. Read uncommitted permits dirty reads, non-repeatable reads, and phantoms. Read committed forbids dirty reads but still permits non-repeatable reads and phantoms. Repeatable read additionally forbids non-repeatable reads, leaving phantoms. Serializable forbids all three and makes concurrent transactions behave as if they ran one after another. That is the standard. What each engine actually implements is stricter in places, and the defaults differ.</p>\n<h2>How PostgreSQL behaves</h2>\n<p>PostgreSQL defaults to read committed. It uses multiversion concurrency control (MVCC), so each statement sees a snapshot of the data as it was at a point in time, and reading never blocks writing while writing never blocks reading. Internally it implements only three distinct levels, so asking for read uncommitted gives you read committed behaviour: PostgreSQL never returns a dirty read.</p>\n<p>Its repeatable read is stronger than the standard requires. It is implemented as snapshot isolation and does not allow phantom reads. Its serializable level adds Serializable Snapshot Isolation (SSI) on top, which detects dependencies among concurrent transactions and aborts one rather than allowing a non serialisable outcome. Because of that, both serializable and repeatable read transactions can fail at commit with a serialization error, and the application must be prepared to retry them.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FF7B72\">BEGIN</span><span style=\"color:#E6EDF3\">;</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">SET</span><span style=\"color:#FF7B72\"> TRANSACTION</span><span style=\"color:#FF7B72\"> ISOLATION</span><span style=\"color:#FF7B72\"> LEVEL</span><span style=\"color:#FF7B72\"> SERIALIZABLE</span><span style=\"color:#E6EDF3\">;</span></span>\n<span class=\"line\"><span style=\"color:#8B949E\">-- ... statements ...</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">COMMIT</span><span style=\"color:#E6EDF3\">;  </span><span style=\"color:#8B949E\">-- may raise: could not serialize access due to read/write dependencies</span></span></code></pre></div><h2>How MySQL InnoDB behaves</h2>\n<p>MySQL with InnoDB defaults to repeatable read, not read committed. This difference alone catches people porting assumptions between the two. InnoDB uses consistent snapshot reads, so within a transaction all plain reads see the snapshot established by the first read.</p>\n<p>InnoDB prevents phantoms at repeatable read through locking rather than pure snapshotting. It uses next-key locking, which combines an index record lock with a gap lock on the gap before the record, so other transactions cannot insert into the range you are reading. Drop to read committed and gap locking is largely disabled, which lets phantoms reappear. MySQL reaches serializable mainly by locking too: it implicitly turns plain SELECT statements into locking reads. The practical consequence is that the failure you hit under contention in MySQL is usually a lock wait timeout or a deadlock (which rolls back one transaction) rather than the dependency based serialization abort you get in PostgreSQL.</p>\n<h2>Handling serialization failures</h2>\n<p>At the strict levels, &quot;the database will sort it out&quot; is not a strategy. Under PostgreSQL serializable, and repeatable read too, a transaction can be aborted to preserve correctness, and the only correct response is to retry it from the beginning. Under MySQL, a deadlock victim is rolled back and likewise needs retrying. Either way, wrap transactions that run at these levels in retry logic with a bounded number of attempts, and make the work inside them idempotent so a retry is safe.</p>\n<h2>Setting the level</h2>\n<p>Both engines let you set the level per transaction with SET TRANSACTION ISOLATION LEVEL, choosing from serializable, repeatable read, read committed, and read uncommitted. Set it deliberately based on what the transaction needs: read committed for ordinary work, repeatable read or serializable when a transaction makes decisions based on data it must not see change underneath it.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FF7B72\">SET</span><span style=\"color:#FF7B72\"> TRANSACTION</span><span style=\"color:#FF7B72\"> ISOLATION</span><span style=\"color:#FF7B72\"> LEVEL</span><span style=\"color:#FF7B72\"> REPEATABLE</span><span style=\"color:#FF7B72\"> READ</span><span style=\"color:#E6EDF3\">;</span></span></code></pre></div><h2>Conclusion</h2>\n<p>Isolation levels are a trade between how much concurrency anomaly you tolerate and how much throughput you keep. Start from the defaults, but know them: PostgreSQL gives you read committed, MySQL InnoDB gives you repeatable read, and the two engines reach the stricter levels by different means, snapshots and SSI in PostgreSQL, gap and next-key locks in MySQL. Raise the level only where a transaction genuinely needs it, and where you do, add retry logic, because at the top of the scale the database will sometimes refuse to commit rather than be wrong.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "databases",
        "sql",
        "transactions",
        "postgresql"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/connection-pooling-explained",
      "url": "https://soulstack.co.uk/blog/connection-pooling-explained",
      "title": "Connection pooling explained",
      "summary": "Most applications do not need many database connections, they need to reuse a few of them well. Opening a fresh connection for every request looks harmless in development and fall…",
      "content_html": "<p>Most applications do not need many database connections, they need to reuse a few of them well. Opening a fresh connection for every request looks harmless in development and falls over in production, because a database connection carries real cost on the server and databases cap how many can exist at once. A connection pool keeps a small set of established connections and hands them out as work arrives. This post explains why a connection is expensive, what a pool does, the difference between a client side pool and a server side pooler, and how to size one without making it worse.</p>\n<h2>Why a connection is not free</h2>\n<p>Establishing a connection means a network round trip and authentication before any query runs, and each live connection then consumes resources on the server for its whole lifetime. How much, and of what, depends on the engine.</p>\n<h3>PostgreSQL: a process per connection</h3>\n<p>PostgreSQL uses a process per user model. The server listens on a port and spawns a new backend operating system process for every connection, and each client talks to exactly one backend. Processes are not cheap, so the max_connections setting both caps concurrency and sizes server resources: PostgreSQL allocates certain resources, including shared memory, in proportion to max_connections, and the value can only be changed at server start. The default is typically 100. This is why PostgreSQL in particular benefits from keeping the number of real connections low, and why pushing max_connections up to paper over a connection leak tends to hurt.</p>\n<h3>MySQL: a thread per connection</h3>\n<p>MySQL&#39;s default model runs one thread per client connection rather than a process. Threads are lighter than processes, but they are not free either: there are as many threads as connected clients, and thread creation and disposal becomes expensive while each thread needs server and kernel resources such as stack space. MySQL&#39;s max_connections defaults to 151, and the server actually permits one more than that, reserving the extra slot for an administrator with the CONNECTION_ADMIN privilege so you can still get in when the pool has exhausted the limit.</p>\n<h2>What a connection pool does</h2>\n<p>A pool opens a number of connections up front and keeps them alive. When the application needs to run a query it borrows a connection from the pool, uses it, and returns it, rather than opening and closing one. The connection setup cost is paid a handful of times at startup instead of on every request, and the number of real connections the database sees is bounded by the pool size no matter how many requests are in flight. That bound is the main prize: it protects the database from being swamped.</p>\n<h2>Client side pools versus server side poolers</h2>\n<p>Pooling can live in two places, and they solve overlapping but different problems.</p>\n<h3>A pool inside the application</h3>\n<p>A client side pool runs inside the application process. In the Java world HikariCP is the common choice, pooling JDBC connection objects within each instance of the app. This is simple and fast, but the bound is per instance: ten app instances with a pool of twenty each can still open two hundred connections to the database.</p>\n<h3>A pooler in front of the database</h3>\n<p>A server side pooler is a separate process that sits between the clients and the database and funnels many client connections down to a few database connections. PgBouncer is the standard example for PostgreSQL, describing itself as a lightweight connection pooler. Because it is central, it can hold the total number of real backends low across every app instance at once, which is exactly the pressure PostgreSQL&#39;s process per connection model is sensitive to.</p>\n<p>PgBouncer offers three pooling modes, and the difference matters:</p>\n<ul>\n<li>Session pooling: a server connection is tied to a client for as long as the client stays connected. The most compatible mode, and the least aggressive.</li>\n<li>Transaction pooling: a server connection is assigned to a client only for the duration of a transaction, then returned to the pool. This packs far more clients onto few connections.</li>\n<li>Statement pooling: like transaction pooling, but multi statement transactions are disallowed, effectively forcing autocommit.</li>\n</ul>\n<p>Transaction pooling breaks some client expectations by design, so it can only be used if the application avoids features that rely on session state. Without extra configuration, things that do not work in transaction mode include SET and RESET, LISTEN, cursors held across transactions, PREPARE and DEALLOCATE, session level temporary tables, the LOAD statement, and session level advisory locks. If your code leans on any of those, use session pooling or change the code.</p>\n<h2>Sizing a pool</h2>\n<p>A bigger pool is not a faster one. HikariCP&#39;s guidance on pool sizing is blunt: you want a small pool saturated with threads waiting for a connection, not a large one. As a starting point it borrows a formula from the PostgreSQL community:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span>connections = (core_count * 2) + effective_spindle_count</span></span></code></pre></div><p>For a four core machine with a single spinning disk that lands at roughly nine or ten connections, far fewer than most people guess. The spindle term is an input or output concurrency proxy that solid state storage complicates, so treat the formula as a heuristic and a place to begin, then measure your own workload. The recurring finding behind the advice is that throughput flattens and latency climbs well before the pool gets large, because the database can only do so much work in parallel regardless of how many connections are queued at it.</p>\n<h2>Stacking pools</h2>\n<p>The two layers compose. A common topology is a client side pool such as HikariCP inside each application instance, pointed at a server side pooler such as PgBouncer in transaction mode, pointed at a small number of real PostgreSQL backends. The application pool gives each instance fast local reuse, and the central pooler keeps the total backend count low across all instances. This is an architectural pattern rather than a single documented recommendation, so size both layers deliberately: the app pools should not collectively demand more than the pooler, and the pooler should not demand more backends than the database is configured for.</p>\n<h2>Conclusion</h2>\n<p>Connections cost real resources, a process each in PostgreSQL and a thread each in MySQL, and both engines cap how many can exist. A pool turns that cost into a fixed, small set of reused connections and shields the database from request spikes. Decide where the pool lives, in the application, in front of the database, or both, choose a PgBouncer mode your code can actually tolerate, and resist the urge to make the pool large. Small and busy beats large and idle, and the database will thank you for it.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "databases",
        "postgresql",
        "performance",
        "infrastructure"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/a-practical-git-branching-strategy-for-small-teams",
      "url": "https://soulstack.co.uk/blog/a-practical-git-branching-strategy-for-small-teams",
      "title": "A practical Git branching strategy for small teams",
      "summary": "Small teams need a Git branching strategy that keeps the default branch stable, keeps review visible, and avoids process that is heavier than the work. A simple trunk based approa…",
      "content_html": "<p>Small teams need a Git branching strategy that keeps the default branch stable, keeps review visible, and avoids process that is heavier than the work. A simple trunk based approach with short lived topic branches is usually enough.</p>\n<h2>Start with a stable default branch</h2>\n<p>Treat the default branch as the integration branch. It should represent code that has passed review and the required automated checks. Do not use it as a place to collect unfinished work.</p>\n<p>Protect the default branch in the hosting platform. A useful minimum is:</p>\n<ul>\n<li>require a pull request before merging</li>\n<li>require at least one approving review</li>\n<li>require status checks to pass before merging</li>\n<li>block force pushes to the protected branch</li>\n<li>block deletion of the protected branch</li>\n</ul>\n<p>Those controls matter because a small team has fewer people available to notice accidental direct pushes, broken builds, or unreviewed changes. Branch protection turns the desired workflow into a repository rule instead of a team memory test.</p>\n<h2>Use short lived topic branches</h2>\n<p>Create a branch for each small change. A branch may hold a feature, bug fix, refactor, dependency update, or documentation change. Keep the scope narrow enough that the pull request can be reviewed in one sitting.</p>\n<p>A simple naming pattern is enough:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> switch</span><span style=\"color:#79C0FF\"> -c</span><span style=\"color:#A5D6FF\"> fix/login-timeout</span></span></code></pre></div><p>Use lower case words separated by hyphens. Avoid encoding too much process into the branch name. The issue tracker, pull request title, and pull request description can carry the details.</p>\n<p>Short lived branches reduce merge conflicts because each branch spends less time diverging from the default branch. They also make review easier because the reviewer can understand the whole change without reconstructing weeks of context.</p>\n<h2>Keep long running branches rare</h2>\n<p>Long running branches are sometimes unavoidable, but they should not be the default. They create delayed integration risk: code appears safe inside the branch but conflicts with the real product state when it finally comes back.</p>\n<p>Prefer smaller slices behind configuration, feature flags, or inactive code paths when a change cannot ship at once. The goal is to merge safe pieces early while keeping unfinished behaviour disabled.</p>\n<p>Use a long running branch only when there is a clear reason, such as a release stabilisation branch. Give it an owner, an expiry point, and a merge policy. Without those limits, the branch becomes a second default branch.</p>\n<h2>Decide how changes reach the default branch</h2>\n<p>For most small teams, the merge policy should be consistent and boring. Pick one of these and document it.</p>\n<h3>Squash merge</h3>\n<p>Squash merge keeps the default branch tidy by turning a pull request into one commit. It works well when contributors make many small fixup commits during review.</p>\n<p>The trade off is that individual commits from the branch are not preserved on the default branch. That is acceptable if the pull request is the unit of review and the squashed commit message explains the final change clearly.</p>\n<h3>Merge commit</h3>\n<p>A merge commit preserves the branch topology. It is useful when the team wants to see exactly when a topic branch was integrated.</p>\n<p>The trade off is a noisier history, especially when many small branches are merged every day.</p>\n<h3>Rebase and merge</h3>\n<p>A rebase based workflow can keep history linear by replaying branch commits onto the default branch without a merge commit. It works best when branches are private to one author and are rebased before merge.</p>\n<p>The risk is rewriting commits that other people may already have based work on. Do not require developers to rebase shared branches unless the team understands the impact and has a clear recovery process.</p>\n<h2>Keep release branches simple</h2>\n<p>A small team does not need a complex release model for every repository. If the product deploys continuously, the default branch may be the release source.</p>\n<p>If releases need stabilisation, create a release branch from the default branch at the cut point:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> switch</span><span style=\"color:#A5D6FF\"> main</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> pull</span><span style=\"color:#79C0FF\"> --ff-only</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> switch</span><span style=\"color:#79C0FF\"> -c</span><span style=\"color:#A5D6FF\"> release/2026-06</span></span></code></pre></div><p>Only merge fixes that are needed for that release. After the release, merge or cherry pick the fixes back to the default branch if they are not already there. Do not continue normal feature work on the release branch.</p>\n<h2>Keep hotfixes visible</h2>\n<p>For an urgent production fix, branch from the commit that matches the affected release, or from the current default branch if that is what production runs. Make the smallest safe change, review it, run the checks, and merge it through the same protected path.</p>\n<p>Avoid direct pushes even under pressure. The process is most valuable when the change is urgent, because urgent changes are more likely to miss a step.</p>\n<h2>Delete branches after merge</h2>\n<p>Delete topic branches after they are merged. The useful record is the merged commit, pull request, review discussion, and linked issue. Keeping old branches around makes the branch list harder to read and increases the chance that someone restarts work from stale code.</p>\n<h2>Document the workflow in the repository</h2>\n<p>Put the workflow in a short contributor document. Include:</p>\n<ul>\n<li>the default branch name</li>\n<li>the branch naming convention</li>\n<li>the expected pull request size</li>\n<li>the required checks</li>\n<li>the merge method</li>\n<li>the hotfix process</li>\n<li>the rule for release branches</li>\n</ul>\n<p>A small document is easier to follow than an oral tradition. It also helps new contributors understand how to get a change merged without guessing.</p>\n<h2>Conclusion</h2>\n<p>A good branching strategy for a small team is simple: protect the default branch, use short lived topic branches, review every change, require passing checks, and avoid long running branches unless there is a specific release need. The best workflow is not the one with the most branch types. It is the one the team can follow every day without bypassing it.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "git",
        "devops",
        "cli"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/a-practical-guide-to-caching-on-the-web",
      "url": "https://soulstack.co.uk/blog/a-practical-guide-to-caching-on-the-web",
      "title": "A practical guide to caching on the web",
      "summary": "Web caching stores responses so later requests can be answered faster, with less bandwidth and less load on the origin. The hard part is not turning caching on. The hard part is d…",
      "content_html": "<p>Web caching stores responses so later requests can be answered faster, with less bandwidth and less load on the origin. The hard part is not turning caching on. The hard part is deciding which responses may be reused, for how long, and by whom. This guide covers the controls that decide that, sensible defaults by response type, and how to debug what a cache is actually doing.</p>\n<h2>What can cache a response</h2>\n<p>A response can be cached by the browser, a shared proxy, a CDN, or a service worker. Browser caches are private to one user. Shared caches can serve many users from a single stored copy. Service worker caches are controlled by application code within an origin.</p>\n<p>Shared caches need stricter rules because one stored response can be reused for another user. Personalised responses should be marked private, or not stored at all, unless the application has a safe and deliberate design for sharing them.</p>\n<h2>Freshness and validation</h2>\n<p>A cached response is fresh when its caching metadata says it can be reused without contacting the origin. Freshness is commonly controlled with Cache-Control max-age or s-maxage.</p>\n<p>A stale response may still be useful if it can be validated. Validation asks the origin whether the stored response is still current. ETag with If-None-Match is the usual strong tool. Last-Modified with If-Modified-Since is also common but has lower precision.</p>\n<p>When the origin returns 304 Not Modified, the cache reuses the stored body and updates the stored header fields from the 304 response.</p>\n<h2>Cache-Control is the main control surface</h2>\n<p>Cache-Control carries directives for caches.</p>\n<ul>\n<li>max-age defines how long a response stays fresh, in seconds.</li>\n<li>s-maxage applies to shared caches and overrides max-age there. Private caches ignore it.</li>\n<li>no-store tells caches of any kind not to store the response.</li>\n<li>no-cache means the response may be stored, but it must be validated with the origin before each reuse. It does not mean do not cache.</li>\n<li>private means the response is intended for a single user and may be stored only in a private cache, never a shared one.</li>\n<li>public means the response may be stored by a shared cache, subject to the rest of the rules.</li>\n</ul>\n<p>Use these directives deliberately. no-cache and no-store are often confused, but they are not equivalent. One allows storage with mandatory revalidation, the other forbids storage entirely.</p>\n<h2>Good defaults by response type</h2>\n<p>Static assets with content hashed filenames can usually be cached for a long time, because a content change produces a new URL. The common pattern is a long max-age plus immutable for hashed assets, which lets caches skip revalidation while the response is fresh.</p>\n<p>HTML documents usually need shorter freshness or validation, because they point to the current asset graph. APIs vary. Public, read-only data may be cacheable. User specific data should normally be private or no-store depending on sensitivity.</p>\n<p>Authentication pages, account pages, and responses containing secrets should use no-store unless there is a specific reviewed reason not to. Note that a public directive will cause a response to an authenticated request to be stored in a shared cache, so use it with care.</p>\n<h2>Vary changes the cache key</h2>\n<p>Vary tells caches which request header fields affect the selected response. For example, Vary: Accept-Encoding is used when compressed and uncompressed variants exist. Vary: Origin matters when CORS policy depends on the Origin request header.</p>\n<p>Use Vary carefully. A broad Vary value can wreck cache effectiveness by creating many variants. A missing Vary value can make a cache reuse the wrong response.</p>\n<h2>CDNs and shared caches</h2>\n<p>CDNs are shared caches placed close to users. They can cache static assets, public pages, and selected API responses, hide origin latency, and absorb traffic spikes.</p>\n<p>Be explicit about shared cache behaviour. s-maxage can let a CDN hold a response for longer than the browser. private can prevent shared caching when a response is user specific. no-store covers responses that must not be stored anywhere.</p>\n<p>Do not rely on a CDN default for sensitive content. Set headers at the origin so behaviour is portable and reviewable.</p>\n<h2>Revalidation is not a failure</h2>\n<p>A 304 response is a successful optimisation, not a miss. The client still contacts a cache or the origin, but it avoids downloading the full body when the stored copy is current.</p>\n<p>This is useful for HTML, API responses, and assets whose URL does not change with content. It gives correctness without forcing a full download on every request.</p>\n<h2>Invalidation is harder than expiry</h2>\n<p>Expiry lets cached content age out naturally. Invalidation tries to remove or replace cached content before its normal lifetime ends. CDNs often provide purge APIs, but invalidation across browsers is harder, because browser caches are distributed across user devices.</p>\n<p>For assets, prefer versioned or hashed URLs. For documents and APIs, use shorter freshness windows and validation. For urgent correctness, design URLs and headers so clients are never stuck with long lived incorrect responses.</p>\n<h2>Debugging cache behaviour</h2>\n<p>Inspect response headers first. Check Cache-Control, ETag, Last-Modified, Age, Expires, and Vary. In browser developer tools, confirm whether the response came from memory cache, disk cache, a service worker, the network, or a CDN.</p>\n<p>Also inspect request headers. Conditional requests carry If-None-Match or If-Modified-Since. Cache bypasses may include Cache-Control request directives sent by the browser or developer tools.</p>\n<h2>Conclusion</h2>\n<p>Caching works best when the policy matches the data. Cache hashed static assets for a long time, validate HTML and changeable API responses, keep personalised data private, and avoid storing secrets. Use Cache-Control, validators, and Vary as the explicit contract between origin, browser, and shared caches.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "web",
        "performance",
        "architecture"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/ai-coding-agents-need-platform-engineering-more-than-prompts",
      "url": "https://soulstack.co.uk/blog/ai-coding-agents-need-platform-engineering-more-than-prompts",
      "title": "AI coding agents need platform engineering more than prompts",
      "summary": "AI coding agents make platform engineering more important, not less. Better prompts can improve a single interaction, but agents that change code, open pull requests or run tasks…",
      "content_html": "<p>AI coding agents make platform engineering more important, not less. Better prompts can improve a single interaction, but agents that change code, open pull requests or run tasks need safe paths, clear permissions and reliable feedback from the engineering system around them.</p>\n<h2>Agents amplify the existing delivery system</h2>\n<p>An agent inherits the quality of the repository, tests, documentation, dependency model and deployment path it works inside. If a project has unclear ownership, flaky tests, hidden build steps and undocumented conventions, the agent will operate inside that confusion.</p>\n<p>Prompt quality matters, but it cannot replace engineering hygiene. The most useful instruction for an agent is often not a clever prompt. It is a working test suite, a clear contribution guide, a small task boundary and a platform that can prove whether the result is safe.</p>\n<h2>The platform becomes the control surface</h2>\n<p>Agents need the same golden paths as humans, with stricter boundaries. They should use standard project templates, standard CI checks, standard dependency policies and standard deployment gates.</p>\n<p>The platform should define what an agent can read, write, execute and submit. Repository permissions, workflow tokens, secrets, network access and environment access should be limited by default. An agent should not need broad production capability to edit a documentation page or refactor a test.</p>\n<h2>Pull requests are still the accountability boundary</h2>\n<p>When an agent produces code, the pull request should show the change, the tests, the generated artefacts and the reasoning that is safe to expose. The human reviewer remains accountable for accepting the change.</p>\n<p>That changes the review job. The reviewer must look for incorrect assumptions, overly broad changes, missing tests, insecure patterns and places where the agent satisfied the prompt while violating the intent. The platform can help by attaching policy results, dependency diffs and test evidence to the pull request.</p>\n<h2>Agent work should be small</h2>\n<p>Large autonomous changes are harder to review than large human changes because the reviewer has less memory of the path taken. Agents should be directed towards small, well scoped tasks with explicit acceptance criteria.</p>\n<p>A good task says what to change, what not to change, how to validate it and what evidence should appear in the pull request. Vague tasks produce vague diffs.</p>\n<h2>Treat prompts and outputs as untrusted input</h2>\n<p>Agent workflows can consume issue text, pull request descriptions, comments and other user controlled content. That content should be treated as untrusted input, especially when it can influence shell commands, workflow logic, credentials or generated code.</p>\n<p>The safe pattern is to keep untrusted text away from privileged execution, reduce token permissions, avoid passing secrets to agent contexts and require human approval before high impact actions.</p>\n<h2>Conclusion</h2>\n<p>AI coding agents do not remove the need for platform engineering. They increase the need for it. The organisations that benefit most will not be the ones with the cleverest prompts. They will be the ones with clear golden paths, safe permissions, trustworthy CI and review workflows that make agent output accountable.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "ai",
        "devops",
        "security",
        "architecture"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/api-versioning-strategies-that-age-well",
      "url": "https://soulstack.co.uk/blog/api-versioning-strategies-that-age-well",
      "title": "API versioning strategies that age well",
      "summary": "API versioning works best when it is rare, explicit, and tied to client impact. The goal is not to ship versions quickly. The goal is to let clients keep running while the API evo…",
      "content_html": "<p>API versioning works best when it is rare, explicit, and tied to client impact. The goal is not to ship versions quickly. The goal is to let clients keep running while the API evolves.</p>\n<h2>Treat compatibility as the default</h2>\n<p>Most API changes should not require a new version. Add new endpoints instead of changing the meaning of existing ones. Add optional request fields instead of making existing fields mandatory. Add response fields only when clients are expected to ignore unknown fields. Add enum values only if the contract already says clients must handle unknown values safely.</p>\n<p>Avoid changes that alter existing behaviour behind the same contract. Renaming fields, changing field types, changing identifier formats, removing response fields, tightening validation, changing default sorting, and changing pagination tokens can break clients even when the server still returns a successful HTTP status.</p>\n<h2>Define what counts as breaking</h2>\n<p>Write the compatibility rules down before the first public release. A breaking change is any change that can make a conforming existing client fail, produce different business behaviour, or require a client release. That definition should include schema changes, semantic changes, authentication changes, rate limit changes, error shape changes, and webhook payload changes.</p>\n<p>Do not rely only on type compatibility. A field can keep the same type and still break clients if its meaning changes. For example, changing <code>status</code> from a current state to a latest event label is a semantic break even if it stays a string.</p>\n<h2>Choose one visible versioning model</h2>\n<p>The common choices are path versioning, header versioning, media type versioning, and date-based versioning. Pick one primary model and use it consistently.</p>\n<p>Path versioning, such as <code>/v1/orders</code>, is simple to discover and easy to route. It is also coarse, because the version appears to apply to every resource under that path.</p>\n<p>Header versioning keeps URIs cleaner and can make gradual migration easier, but it is less visible in logs, browser tools, and documentation unless tooling is disciplined.</p>\n<p>Media type versioning can work when representation formats are the main compatibility boundary, but it is harder for many teams to operate and explain.</p>\n<p>Date-based versioning can work well when every request declares the contract date it expects. It requires strong documentation, compatibility discipline, and tooling that can test behaviour by version date.</p>\n<p>Do not mix these models casually. Multiple versioning mechanisms create ambiguity and make support harder.</p>\n<h2>Version the contract clients depend on</h2>\n<p>A public API version should describe externally observable behaviour. It should not mirror service deployments, package versions, database migrations, or internal feature flags. A server can deploy many times without changing the API version. An API version changes when the client contract changes.</p>\n<p>This distinction matters during incidents. If support needs to answer what a client sees, the version must map to request and response behaviour, not to the current commit hash of a service.</p>\n<h2>Keep old versions boring</h2>\n<p>Old versions should receive security fixes, correctness fixes, and reliability improvements, but they should not keep gaining new product surface indefinitely. If every new feature is backported to every version, the maintenance cost grows quickly and clients lose the incentive to migrate.</p>\n<p>Document which versions are supported, what support means, and what happens after support ends. Keep behaviour stable for supported versions. Do not use silent degradation as a migration strategy.</p>\n<h2>Deprecate before removal</h2>\n<p>Deprecation is a communication process, not just a header. Announce what is changing, why it is changing, who is affected, what replacement exists, and when the old contract will stop working. Provide migration examples and test environments where possible.</p>\n<p>Use response headers to make deprecation visible to machines and logs. A deprecation signal should start before removal, and a sunset signal should identify the planned end date when one exists. Support teams should be able to identify active clients that still use the old version.</p>\n<h2>Make migrations observable</h2>\n<p>Every versioning strategy needs telemetry. Track requests by version, endpoint, authentication principal, client application, and error class. Without that data, deprecation becomes guesswork.</p>\n<p>Expose usage data to client owners where possible. A migration email is less useful than a report that says which endpoints the client still calls, which version they use, and when the last call happened.</p>\n<h2>Avoid permanent per-client forks</h2>\n<p>A compatibility exception for one important client may be reasonable during a migration window. It should not become a hidden version. Hidden versions are expensive because they are hard to test, hard to document, and easy to break by accident.</p>\n<p>If a behaviour must live for more than a short migration period, make it part of a documented version or remove it.</p>\n<h2>Test versions as products</h2>\n<p>Contract tests should run for each supported version. Examples should be valid for each supported version. SDK generation should target the correct version. Error responses, pagination, idempotency behaviour, and authentication failures should be tested as carefully as successful responses.</p>\n<p>Do not rely on manual review to preserve compatibility. Automated contract tests catch accidental breaking changes before clients do.</p>\n<h2>Conclusion</h2>\n<p>API versioning ages well when compatibility rules are clear, the versioning model is consistent, old versions stay stable, and deprecation is observable. The best versioning strategy is the one that makes breaking change rare, deliberate, and safe for clients to plan around.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "api",
        "architecture",
        "reliability"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/backups-you-can-actually-restore-from",
      "url": "https://soulstack.co.uk/blog/backups-you-can-actually-restore-from",
      "title": "Backups you can actually restore from",
      "summary": "A backup is only useful when it can be restored inside the recovery window and to an acceptable point in time. Until restoration is tested, the backup is only an assumption.",
      "content_html": "<p>A backup is only useful when it can be restored inside the recovery window and to an acceptable point in time. Until restoration is tested, the backup is only an assumption.</p>\n<h2>Start with RPO and RTO</h2>\n<p>Recovery point objective, or RPO, is the maximum acceptable data loss measured as time. Recovery time objective, or RTO, is the maximum acceptable time to restore service.</p>\n<p>These numbers should be set by business impact, not by default tool settings. A public marketing site, an order database, and an audit log archive can have very different recovery needs.</p>\n<p>Backups must be designed to meet both values. A daily backup cannot meet a fifteen minute RPO. A backup that takes two days to restore cannot meet a four hour RTO.</p>\n<h2>Know what must be recovered</h2>\n<p>Reliable recovery needs more than database files. List everything required to restore the service.</p>\n<p>Typical recovery scope includes:</p>\n<ul>\n<li>application data</li>\n<li>schema and migration history</li>\n<li>object storage</li>\n<li>search indexes or a rebuild plan</li>\n<li>message streams or replay position</li>\n<li>configuration</li>\n<li>secrets and key material</li>\n<li>infrastructure definitions</li>\n<li>container images or release artefacts</li>\n<li>DNS and routing configuration</li>\n<li>runbooks and access paths</li>\n</ul>\n<p>If a component can be rebuilt, document the rebuild steps and expected duration. If it cannot be rebuilt inside the RTO, it belongs in the recovery plan.</p>\n<h2>Protect backups from the same failure</h2>\n<p>Backups should survive the incident that made them necessary. Accidental deletion, compromised credentials, ransomware, regional outage, bad deployment, and operator error all have different failure patterns.</p>\n<p>Use separation deliberately. That may mean separate accounts, separate regions, immutable storage, restricted deletion rights, separate encryption keys, and monitored backup access. The right design depends on the threat model and recovery objectives.</p>\n<p>A backup that can be deleted by the same identity that can delete production data is not strong protection against account compromise.</p>\n<h2>Automate creation and verification</h2>\n<p>Manual backups are easy to miss. Automate backup creation, retention, expiry, and monitoring.</p>\n<p>Monitor at least:</p>\n<ul>\n<li>last successful backup time</li>\n<li>backup size and unexpected size changes</li>\n<li>backup duration</li>\n<li>backup failure count</li>\n<li>replication or copy status</li>\n<li>retention policy compliance</li>\n<li>encryption status</li>\n</ul>\n<p>Verification must go beyond job success. A completed backup job does not prove that the data is usable.</p>\n<h2>Restore regularly</h2>\n<p>Periodic restore tests prove whether the backup process meets RPO and RTO. They also reveal missing permissions, missing configuration, slow transfer paths, incompatible versions, broken encryption keys, and undocumented manual steps.</p>\n<p>A restore test should create a fresh environment, restore from backup, run integrity checks, run application smoke tests, and record elapsed time. The result should be reviewed against the stated objectives.</p>\n<p>Do not test only the easiest path. Test point in time recovery, single object recovery, full environment recovery, and recovery after a deliberately bad change where relevant.</p>\n<h2>Make restoration repeatable</h2>\n<p>The restore process should be scripted where possible and documented where judgement is required.</p>\n<p>A good restore runbook includes:</p>\n<ul>\n<li>prerequisites and required access</li>\n<li>how to choose the restore point</li>\n<li>how to create the recovery environment</li>\n<li>restore commands or workflows</li>\n<li>validation checks</li>\n<li>cutover steps</li>\n<li>rollback or abort criteria</li>\n<li>communication points</li>\n<li>expected timings</li>\n</ul>\n<p>Keep commands current. A restore command copied from an old incident can be worse than no command at all.</p>\n<h2>Validate integrity and application behaviour</h2>\n<p>A database that starts is not necessarily a recovered service. Validate the data and the application.</p>\n<p>Use checks such as:</p>\n<ul>\n<li>database consistency checks</li>\n<li>expected table and object counts</li>\n<li>application smoke tests</li>\n<li>authentication tests</li>\n<li>critical read and write paths</li>\n<li>background worker checks</li>\n<li>audit log continuity</li>\n<li>monitoring and alerting checks</li>\n</ul>\n<p>Record known gaps. If search indexes are rebuilt after restore, state how long that takes and what users see while it happens.</p>\n<h2>Practise destructive scenarios safely</h2>\n<p>The hardest recoveries are caused by bad writes, accidental deletion, and compromise. Practise them in non-production or isolated recovery environments.</p>\n<p>Useful exercises include:</p>\n<ul>\n<li>restore after a deleted table</li>\n<li>restore after corrupted application data</li>\n<li>restore a single tenant or account where architecture supports it</li>\n<li>restore after credentials are rotated</li>\n<li>restore into a clean account or region</li>\n<li>prove that immutable backups cannot be altered by normal production roles</li>\n</ul>\n<p>The point is not to create theatre. The point is to find the step that fails before a real incident.</p>\n<h2>Keep evidence</h2>\n<p>Keep records of backup tests. Record the source backup, restore point, environment, commands or workflow used, duration, validation results, issues found, and follow-up actions.</p>\n<p>This evidence helps audits, but its operational value is greater. It shows whether the team can still restore after architecture, tooling, data volume, or staffing changes.</p>\n<h2>Conclusion</h2>\n<p>Backups are a recovery capability, not a storage habit. Define RPO and RTO, protect backups from the failures you expect, automate creation, test restoration, validate the recovered service, and keep evidence. A backup you cannot restore from is not a recovery plan.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "reliability",
        "devops",
        "databases"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/blameless-postmortems-a-simple-template",
      "url": "https://soulstack.co.uk/blog/blameless-postmortems-a-simple-template",
      "title": "Blameless postmortems: a simple template",
      "summary": "A blameless postmortem turns an incident into operational learning. It records what happened, why it made sense at the time, what the impact was, and what will change. This guide…",
      "content_html": "<p>A blameless postmortem turns an incident into operational learning. It records what happened, why it made sense at the time, what the impact was, and what will change. This guide explains why the blameless approach works and gives you a template you can use today.</p>\n<h2>Why blameless matters</h2>\n<p>Blameless does not mean consequence free. It means the review focuses on systems, conditions, decisions, and safeguards rather than personal fault.</p>\n<p>People act with the information, tools, incentives, and time pressure they have during an incident. A useful review asks why the action seemed reasonable and how the system can make better outcomes more likely next time.</p>\n<p>Blame hides weak signals. Engineers stop sharing details when they expect punishment. Without detail, the organisation fixes symptoms and misses contributing factors.</p>\n<h2>When to write one</h2>\n<p>Write a postmortem for incidents with user impact, data risk, security relevance, missed service objectives, significant operational toil, difficult detection, difficult recovery, or repeated patterns.</p>\n<p>Do not reserve postmortems only for major outages. Smaller incidents often reveal the same weak controls before they cause larger failures.</p>\n<p>The process should be lightweight enough that teams can use it regularly.</p>\n<h2>What a postmortem must contain</h2>\n<p>A good postmortem contains:</p>\n<ul>\n<li>summary</li>\n<li>impact</li>\n<li>detection</li>\n<li>timeline</li>\n<li>contributing factors</li>\n<li>what went well</li>\n<li>what went poorly</li>\n<li>where the team was lucky</li>\n<li>follow-up actions with owners</li>\n<li>links to evidence</li>\n</ul>\n<p>Avoid vague root cause statements such as &quot;human error&quot;. They stop investigation too early. A better statement explains the conditions that allowed the action to cause impact.</p>\n<h2>Keep the timeline factual</h2>\n<p>The timeline should record observable events in order. Include alert times, deployment times, customer reports, mitigation steps, escalations, decisions, and recovery milestones.</p>\n<p>Use exact times and a single time zone. Mark uncertainty clearly. Do not rewrite the timeline to make the response look smoother than it was.</p>\n<p>The timeline is evidence. Analysis belongs in the contributing factors section.</p>\n<h2>Separate impact from drama</h2>\n<p>Impact should be specific and measurable where possible. State who or what was affected, for how long, and how severely.</p>\n<p>Useful impact statements include:</p>\n<ul>\n<li>percentage of failed requests</li>\n<li>number of affected users or tenants</li>\n<li>duration of elevated latency</li>\n<li>delayed jobs or messages</li>\n<li>data freshness delay</li>\n<li>support ticket volume</li>\n<li>missed service objective budget</li>\n</ul>\n<p>Avoid emotional language. The facts are enough.</p>\n<h2>Focus on contributing factors</h2>\n<p>Most incidents have multiple contributing factors. A deployment may trigger the incident, but weak tests, missing alerts, unsafe defaults, unclear ownership, poor rollback, or hidden coupling may let it grow.</p>\n<p>Look for factors in:</p>\n<ul>\n<li>detection</li>\n<li>diagnosis</li>\n<li>mitigation</li>\n<li>deployment and change control</li>\n<li>capacity and saturation</li>\n<li>dependency behaviour</li>\n<li>configuration and secrets</li>\n<li>documentation and runbooks</li>\n<li>access and tooling</li>\n<li>communication</li>\n</ul>\n<p>The goal is to improve the system of work, not to find one person or one line of code to blame.</p>\n<h2>Make actions concrete</h2>\n<p>A postmortem is only useful if it leads to change. Each action needs an owner, due date, expected outcome, and a way to verify completion.</p>\n<p>Good actions reduce likelihood, reduce impact, improve detection, improve recovery, or improve learning. Bad actions say only &quot;be more careful&quot;, &quot;add tests&quot; without naming the missing test, or &quot;improve monitoring&quot; without naming the signal and alert condition.</p>\n<p>Limit the number of actions. A small set of completed improvements is better than a long list that nobody finishes.</p>\n<h2>A simple template</h2>\n<h3>Summary</h3>\n<p>Write three to five sentences. State what happened, when it happened, how it was detected, the impact, and the current status.</p>\n<h3>Impact</h3>\n<p>Describe user, business, data, security, and operational impact. Include start time, end time, severity, affected functions, and measurable indicators.</p>\n<h3>Detection</h3>\n<p>Explain how the incident was detected. State whether detection came from monitoring, customer reports, internal users, scheduled checks, or manual review. Note any delay between impact and detection.</p>\n<h3>Timeline</h3>\n<p>List factual events in order. Use one time zone.</p>\n<h3>Contributing factors</h3>\n<p>Explain the technical, operational, and organisational conditions that contributed to the incident. Do not use personal blame as a cause.</p>\n<h3>What went well</h3>\n<p>Record behaviours, tools, safeguards, or preparation that helped.</p>\n<h3>What went poorly</h3>\n<p>Record gaps that made detection, diagnosis, mitigation, communication, or recovery harder.</p>\n<h3>Where we were lucky</h3>\n<p>Record conditions that limited the incident but should not be relied on next time.</p>\n<h3>Follow-up actions</h3>\n<p>For each action, include owner, due date, priority, expected outcome, and verification method.</p>\n<h3>Links</h3>\n<p>Link to dashboards, alerts, incident channel, deployment, logs, traces, tickets, and related postmortems.</p>\n<h2>Review the review</h2>\n<p>Schedule the review soon after the incident, while details are fresh. Invite people who responded and people who own the affected systems. Keep the meeting focused on learning and decisions.</p>\n<p>After the meeting, publish the postmortem where the team can find it. Track actions to completion. Review older postmortems for repeated themes.</p>\n<h2>Conclusion</h2>\n<p>A blameless postmortem is a practical engineering tool. It documents impact, preserves the timeline, explains contributing factors, and turns learning into owned actions. Keep it factual, humane, and specific enough to change the system.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "reliability",
        "devops"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/building-guardrails-for-ai-generated-software-without-killing-speed",
      "url": "https://soulstack.co.uk/blog/building-guardrails-for-ai-generated-software-without-killing-speed",
      "title": "Building guardrails for AI-generated software without killing speed",
      "summary": "AI generated software needs guardrails, but guardrails that slow every change equally will be bypassed. The useful approach is risk based: make low risk work fast, make high risk…",
      "content_html": "<p>AI generated software needs guardrails, but guardrails that slow every change equally will be bypassed. The useful approach is risk based: make low risk work fast, make high risk work explicit, and keep privileged actions under human control.</p>\n<h2>Guardrails should sit where decisions happen</h2>\n<p>A policy document is not a guardrail unless it changes behaviour. Useful guardrails appear in the editor, the pull request, the CI pipeline, the deployment system, and runtime controls.</p>\n<p>Examples include dependency review before merge, secret scanning across repository history, code scanning in pull requests, protected branches, required reviews for sensitive paths, restricted workflow permissions, and deployment approvals for production environments. These are not exotic. They are standard platform features, and using them well is most of the work.</p>\n<h2>Classify change risk</h2>\n<p>Not every generated change deserves the same process. A documentation edit, a unit test refactor, a dependency upgrade, an authentication change, and a database migration have very different risk profiles, so treating them identically wastes effort on safe changes and under scrutinises dangerous ones.</p>\n<p>A good guardrail system classifies changes by files touched, permissions changed, dependencies altered, data paths affected, and deployment impact. Low risk changes should move quickly when checks pass. High risk changes should require stronger review and evidence.</p>\n<h2>Keep secrets out of agent context</h2>\n<p>Secrets should not be available to general code generation tasks. An agent does not need production credentials to edit source files. If a task genuinely requires sensitive access, it should use narrow, auditable, short lived permissions and a workflow designed for that purpose.</p>\n<p>Secret scanning remains necessary because mistakes will still happen. The control should detect exposed credentials and trigger rotation, not merely warn that a secret shaped string exists somewhere in the diff.</p>\n<h2>Treat generated code like third party code</h2>\n<p>AI output can introduce vulnerable patterns, licence concerns, dependency bloat, or code that nobody fully understands. Review it with the same scepticism you would apply to an external contribution.</p>\n<p>In practice that means checking provenance where relevant, preferring small diffs, requiring tests that prove behaviour, avoiding unnecessary dependencies, and rejecting code the team cannot maintain. The author being a model rather than a person does not lower the bar.</p>\n<h2>Use policy as code carefully</h2>\n<p>Policy as code can keep standards consistent, but it has to be understandable. A blocked change should explain what rule fired, why it matters, and how to fix it or request an exception.</p>\n<p>Opaque policy creates resentment. Clear policy creates speed, because engineers can correct issues themselves without waiting for another team to interpret the result.</p>\n<h2>Measure bypass pressure</h2>\n<p>If teams constantly request exceptions, disable checks, or move work outside the standard path, the guardrails are probably misaligned. Track bypasses, false positives, review delays, and any incidents that escaped controls, then tune accordingly.</p>\n<p>The aim is not maximum restriction. The aim is fewer unsafe changes reaching production with less human toil.</p>\n<h2>Conclusion</h2>\n<p>AI generated software does not need a separate bureaucracy. It needs strong engineering defaults: least privilege, trustworthy CI, targeted review, clear policy, and runtime detection. Guardrails preserve speed when they sit close to the work, scale with risk, and are easy to understand.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "ai",
        "security",
        "devops",
        "architecture"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/choosing-between-sql-and-nosql",
      "url": "https://soulstack.co.uk/blog/choosing-between-sql-and-nosql",
      "title": "Choosing between SQL and NoSQL",
      "summary": "SQL and NoSQL are not quality labels. They describe different data models, query models and operational trade-offs. The right choice depends on the shape of the data, the invarian…",
      "content_html": "<p>SQL and NoSQL are not quality labels. They describe different data models, query models and operational trade-offs. The right choice depends on the shape of the data, the invariants the system must protect, the queries it must answer and how the workload must scale.</p>\n<h2>Start with the data model</h2>\n<p>A relational database stores data in tables with rows, columns, keys and constraints. It is a strong fit when the domain has clear relationships, shared entities and rules that should be enforced close to the data.</p>\n<p>A document database stores related data together in documents. It is a strong fit when an aggregate is usually read and written as one unit, and when the document shape needs to evolve without coordinating every field across every record at the same time.</p>\n<p>A key-value store is a strong fit for simple lookups by key, cache-like access patterns and workloads where the application already knows the access path.</p>\n<p>A wide-column or column-family store can fit very large, high-throughput access patterns, but it often requires query-driven modelling up front.</p>\n<h2>Use SQL when relationships drive the workload</h2>\n<p>Choose a relational database when joins, constraints and transactions are central to correctness. Foreign keys, unique constraints, check constraints and transactional updates are useful because they put critical rules in the database rather than spreading them across application code.</p>\n<p>SQL is also a good fit for ad hoc reporting and changing query requirements. A normalised model can answer new questions without duplicating every possible read shape in advance.</p>\n<p>Relational databases are not limited to small systems. Modern SQL engines support partitioning, replication, indexing, materialised views and high availability patterns. The question is not whether SQL can scale, but whether its scaling model matches the workload.</p>\n<h2>Use NoSQL when access patterns are stable and aggregate-shaped</h2>\n<p>Choose a document database when most operations read or write a whole aggregate, such as a profile, catalogue item or event payload. Embedding related data can reduce joins and make reads simple.</p>\n<p>Choose a key-value design when the dominant operation is direct lookup by key and secondary querying is limited or handled elsewhere.</p>\n<p>NoSQL systems often trade general query flexibility for simpler horizontal scaling or a model that matches the application aggregate. That trade can be excellent when access patterns are known and stable. It can be expensive when the product later needs complex cross-entity queries.</p>\n<h2>Be careful with consistency assumptions</h2>\n<p>Do not assume all SQL systems provide the same isolation, and do not assume all NoSQL systems lack transactions. Many document databases support indexes and multi-document transactions, but the limits, performance costs and modelling guidance are engine-specific.</p>\n<p>The key question is the invariant. If the system must update several independent records together and reject conflicting writes, check the exact transaction and isolation behaviour before choosing the database.</p>\n<p>If eventual consistency is acceptable, design the user experience and reconciliation process explicitly. Eventual consistency is not a reason to ignore correctness. It is a decision about when and how correctness is reached.</p>\n<h2>Model for queries, not labels</h2>\n<p>A database choice should be tested against the real queries.</p>\n<p>Can the system fetch the common read path without excessive joins, scans or network round trips? Can it enforce uniqueness where it matters? Can it update related data atomically where required? Can it answer support, audit and reporting questions without building a second database immediately?</p>\n<p>If the answer depends on adding a search engine, cache, warehouse or stream processor, include that system in the design. The database choice is then part of a wider data architecture, not a single-product decision.</p>\n<h2>Consider operations</h2>\n<p>Operational maturity matters. Backups, restores, migrations, monitoring, connection limits, indexing, failover and staff familiarity can dominate the theoretical model.</p>\n<p>A familiar relational database may be a better first choice than an unfamiliar NoSQL system if the workload is ordinary and the team needs predictable operations. A specialised NoSQL system may be justified when the workload clearly exceeds what a general relational model can serve cleanly.</p>\n<h2>Common mistakes</h2>\n<p>Do not choose NoSQL only to avoid schema design. Every production database has a schema, even if the database does not enforce it centrally. The schema may live in application code, validation rules, indexes, analytics jobs and migration scripts.</p>\n<p>Do not choose SQL only because it is familiar. If every request reads and writes one large aggregate and the relational model only adds joins and impedance, a document model may be simpler.</p>\n<p>Do not split one domain across multiple databases without a reason. Cross-database consistency, migrations and observability are harder than a single-database design.</p>\n<h2>Conclusion</h2>\n<p>Choose SQL when relationships, constraints, transactions and flexible querying are central. Choose NoSQL when the data is naturally aggregate-shaped, access patterns are stable and the chosen engine&#39;s consistency model fits the invariants. The best decision comes from modelling real reads, writes and failure cases, not from treating SQL and NoSQL as competing slogans.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "databases",
        "sql",
        "architecture"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/cookies-sessions-and-tokens-explained",
      "url": "https://soulstack.co.uk/blog/cookies-sessions-and-tokens-explained",
      "title": "Cookies, sessions and tokens explained",
      "summary": "Cookies, sessions and tokens are often discussed together because they all help a web application recognise a later request, but they are not the same thing. A cookie is a browser…",
      "content_html": "<p>Cookies, sessions and tokens are often discussed together because they all help a web application recognise a later request, but they are not the same thing. A cookie is a browser storage and transport mechanism, a session is server side continuity, and a token is a credential or claim container that can be sent with a request. Keeping the roles separate is the difference between a clean design and a confusing one.</p>\n<h2>HTTP is stateless</h2>\n<p>HTTP does not remember earlier requests by itself. Each request carries the information the server needs to understand it. State is added by application design, most commonly with cookies, server side sessions or bearer tokens.</p>\n<p>That stateless base is useful because it keeps the protocol simple and cacheable. It also means authentication state must be handled deliberately.</p>\n<h2>What a cookie does</h2>\n<p>A cookie is small data set by a server with the Set-Cookie response header and returned by the browser in later Cookie request headers when the cookie rules match. The browser decides when to send it based on attributes such as Domain, Path, Secure, HttpOnly, SameSite, Expires and Max-Age.</p>\n<p>Use Secure for cookies that should only be sent over HTTPS. Use HttpOnly so client side JavaScript cannot read the cookie through document.cookie. Use SameSite to control when the cookie is sent with cross-site requests.</p>\n<p>A cookie is not automatically a session. It is just a mechanism for storing and sending name value data under browser controlled rules.</p>\n<h2>What a server side session does</h2>\n<p>A server side session stores state on the server. The browser usually receives an opaque session identifier in a cookie. On each request, the server looks up that identifier and recovers the associated session state.</p>\n<p>This model gives the server direct control over revocation and expiry. It also keeps sensitive session data out of the browser. The trade-off is that the service must store session state or use infrastructure that can locate it reliably.</p>\n<p>Regenerate the session identifier after authentication and privilege changes. That reduces the risk from session fixation, where an attacker tries to force a known session identifier onto a user before they log in.</p>\n<h2>What a token does</h2>\n<p>A token is a value presented by the client as proof of authentication, authorisation or both. It may be opaque, meaning only the issuer can interpret it. It may also be structured, such as a signed JSON Web Token.</p>\n<p>Bearer tokens are common. Any party in possession of a bearer token can use it, so it must be protected as a secret. If a bearer token is stolen, the server cannot tell from the token alone that the wrong party is using it.</p>\n<p>Tokens are often sent in the Authorization header with the Bearer scheme. They can also be stored in cookies, but doing so means the browser cookie rules, CSRF protections and same-site behaviour become part of the security design.</p>\n<h2>Cookies versus Authorization headers</h2>\n<p>Cookies are sent automatically by the browser when their rules match. This is convenient for web applications, but it means cross-site request forgery has to be considered for state changing requests.</p>\n<p>Authorization headers are usually added by application code. This can reduce accidental cross-site submission, but it does not remove the need to protect the token from script access, logs, browser extensions and accidental exposure.</p>\n<p>There is no universal winner. Choose based on the client type, threat model, revocation needs and deployment architecture.</p>\n<h2>Expiry and revocation</h2>\n<p>Short lifetimes reduce the value of a stolen credential. Server side sessions can usually be revoked immediately by deleting or invalidating server state. Opaque tokens can also be revoked if the server checks them against issuer side state.</p>\n<p>Self-contained signed tokens are harder to revoke before expiry because the resource server may be able to validate them without contacting the issuer. That can be useful for scaling, but it pushes more importance onto short lifetimes, key rotation and careful audience and scope checks. Where logout must invalidate a self-contained token immediately, the usual answer is a server side denylist, which gives up some of the stateless benefit.</p>\n<h2>Storage choices matter</h2>\n<p>For browser based applications, HttpOnly Secure cookies are usually safer for long lived session credentials because JavaScript cannot read them. Local storage and session storage are accessible to any JavaScript running in the origin, so a single XSS bug can expose every token stored there.</p>\n<p>This does not make cookies magic. A cookie based design still needs SameSite, CSRF defences for unsafe methods, secure transport, session rotation, logout and server side authorisation checks.</p>\n<h2>Practical defaults</h2>\n<p>Use HTTPS everywhere. Put session identifiers in Secure, HttpOnly cookies. Set SameSite to Lax unless the application has a confirmed cross-site requirement. Use SameSite=None only with Secure and only when cross-site cookie sending is required.</p>\n<p>Keep identifiers unpredictable and generated by a cryptographically secure source. Do not put secrets, roles or personal data in unsigned client readable cookies. Do not treat token claims as trustworthy unless the token signature, issuer, audience, expiry and intended use have been validated.</p>\n<h2>Common mistakes</h2>\n<p>Do not store access tokens in local storage without understanding the XSS risk. Do not use long lived bearer tokens for browser sessions unless there is a strong reason. Do not rely on the presence of a cookie as authorisation. Authentication says who the caller is. Authorisation decides what that caller may do.</p>\n<p>Do not log cookies, access tokens or refresh tokens. Logs often have broader access and longer retention than application memory.</p>\n<h2>Conclusion</h2>\n<p>Cookies transport browser state, sessions keep continuity and tokens carry credentials or claims. Secure designs keep these roles separate. The safest approach is usually boring: HTTPS, HttpOnly Secure cookies for browser sessions, short lifetimes, server side revocation where needed, CSRF protection and strict validation on every request.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "security",
        "web",
        "api"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/cors-demystified",
      "url": "https://soulstack.co.uk/blog/cors-demystified",
      "title": "CORS, demystified",
      "summary": "Cross-Origin Resource Sharing, usually called CORS, is a browser enforcement mechanism controlled by HTTP headers. It does not protect a server from receiving requests. It tells t…",
      "content_html": "<p>Cross-Origin Resource Sharing, usually called CORS, is a browser enforcement mechanism controlled by HTTP headers. It does not protect a server from receiving requests. It tells the browser whether frontend code from one origin may read a response from another origin.</p>\n<h2>Origin is the key boundary</h2>\n<p>An origin is the combination of scheme, host and port. <a href=\"https://example.com\">https://example.com</a> and <a href=\"https://api.example.com\">https://api.example.com</a> are different origins. <a href=\"http://example.com\">http://example.com</a> and <a href=\"https://example.com\">https://example.com</a> are also different origins because the scheme is different.</p>\n<p>Browsers apply the same-origin policy to protect one site from reading another site&#39;s data. CORS is a controlled relaxation of that policy for selected cross-origin requests.</p>\n<h2>CORS is enforced by browsers</h2>\n<p>A server can receive a cross-origin request even when CORS is misconfigured. The browser may still block frontend JavaScript from reading the response. That distinction matters when debugging: a CORS error in the console is usually about browser access to the response, not whether the request reached the server.</p>\n<p>Non-browser clients such as curl, backend services and many mobile HTTP clients do not enforce browser CORS rules in the same way. Do not use CORS as an API authentication control.</p>\n<h2>Simple requests and preflight requests</h2>\n<p>Some cross-origin requests are sent directly by the browser. Others require a preflight. A preflight is an OPTIONS request sent before the actual request so the browser can ask the server whether the cross-origin operation is allowed.</p>\n<p>A request avoids preflight only when it uses GET, HEAD or POST, sets only CORS-safelisted request headers, and uses a Content-Type of application/x-www-form-urlencoded, multipart/form-data or text/plain. Anything else triggers preflight. This is why a cross-origin request with an Authorization header or Content-Type: application/json is preflighted, which surprises many developers.</p>\n<p>The preflight response must authorise the method, headers and origin. If it does not, the browser blocks the actual request or blocks access to the response.</p>\n<h2>The core response headers</h2>\n<p>Access-Control-Allow-Origin tells the browser which origin may read the response. It can be a specific origin or the wildcard *, but * cannot be used for credentialed browser requests.</p>\n<p>Access-Control-Allow-Methods lists methods allowed for the actual request after preflight. Access-Control-Allow-Headers lists request headers allowed for the actual request after preflight. Access-Control-Max-Age tells the browser how long it may cache the preflight result.</p>\n<p>Access-Control-Allow-Credentials: true allows the browser to expose a credentialed response when the request includes credentials and the origin is explicitly allowed.</p>\n<h2>Credentials change the rules</h2>\n<p>Credentials include cookies, TLS client certificates and HTTP authentication entries. By default, browsers do not send credentials on cross-origin fetch or XMLHttpRequest calls unless the code opts in, for example with credentials set to include.</p>\n<p>When credentials are used, the server must return a specific Access-Control-Allow-Origin value. It must not use *. It must also return Access-Control-Allow-Credentials: true for the browser to expose the response to JavaScript.</p>\n<p>Because cookies can be sent automatically, credentialed CORS needs careful server side authorisation. CORS does not replace CSRF protection for state changing requests.</p>\n<h2>Dynamic origin reflection is risky</h2>\n<p>Some systems read the Origin request header and echo it back as Access-Control-Allow-Origin. That is only safe when the origin is checked against a strict allowlist first.</p>\n<p>Never combine unrestricted origin reflection with credentials. That pattern can let an attacker&#39;s site read responses that were intended only for a trusted frontend.</p>\n<p>Also add Vary: Origin when the response varies by Origin and can pass through shared caches. Without Vary, a cache can reuse a response authorised for one origin in a different origin context.</p>\n<h2>CORS failures are usually configuration failures</h2>\n<p>A failed CORS request is often caused by one missing header on the preflight response, a redirect during preflight, an origin mismatch, a method not listed in Access-Control-Allow-Methods, or an application error path that omits CORS headers.</p>\n<p>Check the network panel, not only the console message. The console often reports the browser level failure, while the network panel shows whether the preflight was sent, what status came back and which headers were present.</p>\n<h2>CORS is not a security boundary for APIs</h2>\n<p>CORS can limit which browser origins can read responses. It does not prove who the user is. It does not stop direct requests from scripts, servers or command line tools. It does not validate permissions.</p>\n<p>Every protected endpoint still needs authentication and authorisation. Treat CORS as browser integration policy, not access control.</p>\n<h2>Practical configuration</h2>\n<p>Allow only the frontend origins that need browser access. Return exact origins rather than broad wildcards for application APIs. Keep allowed methods and headers narrow. Enable credentials only when the browser must send cookies or HTTP credentials.</p>\n<p>For public, unauthenticated, read-only resources that are safe for any site to read, Access-Control-Allow-Origin: * can be appropriate. For private APIs, use explicit origins and real authentication.</p>\n<h2>Conclusion</h2>\n<p>CORS is easiest to understand when separated from authentication. The server receives requests according to normal HTTP rules. The browser decides whether frontend JavaScript may read the response according to CORS headers. Configure it narrowly, test the preflight path and never treat it as the only protection on an API.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "web",
        "security",
        "api"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/database-migrations-without-downtime",
      "url": "https://soulstack.co.uk/blog/database-migrations-without-downtime",
      "title": "Database migrations without downtime",
      "summary": "A database migration is safe when old code and new code can run at the same time while the schema changes. Downtime usually appears when a migration takes an exclusive lock, rewri…",
      "content_html": "<p>A database migration is safe when old code and new code can run at the same time while the schema changes. Downtime usually appears when a migration takes an exclusive lock, rewrites a large table, blocks writes or makes application versions disagree about the shape of the data. The fix is to design migrations as small, reversible, compatibility-preserving steps.</p>\n<h2>Use expand and contract</h2>\n<p>The safest pattern is expand and contract. First expand the schema so both old and new application versions can work. Then deploy the application change. Then backfill data if needed. Finally contract the schema after the old code path is gone.</p>\n<p>A nullable column is usually safer to add than a required column with an immediate full-table backfill. The application can start writing the new column, a background job can backfill existing rows in batches, and a later migration can add a constraint after validation.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FF7B72\">ALTER</span><span style=\"color:#FF7B72\"> TABLE</span><span style=\"color:#E6EDF3\"> users </span><span style=\"color:#FF7B72\">ADD</span><span style=\"color:#E6EDF3\"> COLUMN display_name </span><span style=\"color:#FF7B72\">text</span><span style=\"color:#E6EDF3\">;</span></span></code></pre></div><p>Avoid changing a column meaning in place. Add the new column or table, dual-write when necessary, verify consistency, move reads, then remove the old path in a later release.</p>\n<h2>Avoid long blocking operations</h2>\n<p>DDL is not automatically harmless. Some database engines can perform many operations online, but the exact lock level depends on the engine, version, table definition and operation.</p>\n<p>In PostgreSQL, many forms of <code>ALTER TABLE</code> acquire an <code>ACCESS EXCLUSIVE</code> lock, while some subcommands take a lesser lock level documented per operation. PostgreSQL also supports <code>CREATE INDEX CONCURRENTLY</code>, which builds an index without taking locks that prevent concurrent inserts, updates or deletes on the table.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FF7B72\">CREATE</span><span style=\"color:#FF7B72\"> INDEX</span><span style=\"color:#D2A8FF\"> CONCURRENTLY</span><span style=\"color:#E6EDF3\"> orders_customer_id_idx</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">ON</span><span style=\"color:#E6EDF3\"> orders (customer_id);</span></span></code></pre></div><p>In MySQL InnoDB, online DDL uses algorithms such as <code>INSTANT</code>, <code>INPLACE</code> and <code>COPY</code>. A migration should request the intended algorithm when possible, so it fails rather than silently falling back to a slower or more blocking path.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FF7B72\">ALTER</span><span style=\"color:#FF7B72\"> TABLE</span><span style=\"color:#E6EDF3\"> users</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">ADD</span><span style=\"color:#E6EDF3\"> COLUMN display_name </span><span style=\"color:#FF7B72\">varchar</span><span style=\"color:#E6EDF3\">(</span><span style=\"color:#79C0FF\">255</span><span style=\"color:#E6EDF3\">),</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">ALGORITHM=</span><span style=\"color:#E6EDF3\">INSTANT;</span></span></code></pre></div><p>For large MySQL table changes that cannot be done safely by native online DDL, tools such as gh-ost copy data to a shadow table and apply ongoing changes while the original table remains in use.</p>\n<h2>Backfill in batches</h2>\n<p>Backfills should be restartable, observable and rate limited. A single transaction that updates millions of rows can hold locks for too long, generate too much replication lag and make rollback expensive.</p>\n<p>Use deterministic batches. Record progress outside the transaction or make each batch idempotent. Keep transactions short, sleep between batches when the system is under load and stop automatically if replication lag or error rates rise.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FF7B72\">UPDATE</span><span style=\"color:#E6EDF3\"> users</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">SET</span><span style=\"color:#E6EDF3\"> display_name </span><span style=\"color:#FF7B72\">=</span><span style=\"color:#E6EDF3\"> username</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">WHERE</span><span style=\"color:#E6EDF3\"> display_name </span><span style=\"color:#FF7B72\">IS</span><span style=\"color:#FF7B72\"> NULL</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">  AND</span><span style=\"color:#E6EDF3\"> id </span><span style=\"color:#FF7B72\">>=</span><span style=\"color:#79C0FF\"> 10000</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">  AND</span><span style=\"color:#E6EDF3\"> id </span><span style=\"color:#FF7B72\">&#x3C;</span><span style=\"color:#79C0FF\"> 11000</span><span style=\"color:#E6EDF3\">;</span></span></code></pre></div><p>Do not assume the backfill is complete because the job reached the end once. Verify with a count query, then keep the application tolerant of missing values until the constraint is enforced.</p>\n<h2>Keep application versions compatible</h2>\n<p>Zero downtime deployment means more than database availability. It means any live application instance can talk to the database during the rollout.</p>\n<p>A safe migration must support four states: old code with old schema, old code with expanded schema, new code with expanded schema, and new code after cleanup. If any state breaks, the deployment depends on perfect sequencing and becomes fragile.</p>\n<p>Feature flags can help move reads and writes gradually. They also give a rollback path that does not require an emergency schema reversal.</p>\n<h2>Validate before enforcing</h2>\n<p>Constraints are valuable because they move invariants into the database. They still need careful rollout. Add them after the data is clean, and use engine-specific validation features where available.</p>\n<p>Before adding a <code>NOT NULL</code>, unique constraint or foreign key, run checks that prove the existing data satisfies it. Then add the constraint in the least blocking way the database supports.</p>\n<h2>Test on production-like data</h2>\n<p>A migration that completes instantly on a small development database may rewrite a table for hours in production. Test with realistic row counts, indexes, constraints, triggers, replication and concurrent writes.</p>\n<p>Record the expected lock behaviour, runtime, replication impact and rollback plan. The rollback plan should say whether you can revert application code only, stop a backfill, drop a new object or restore data from backup.</p>\n<h2>Conclusion</h2>\n<p>Downtime-free migrations are built from compatibility, small steps and measured database behaviour. Expand first, deploy code second, backfill carefully, validate constraints and contract only after the old path is gone. Never treat DDL as safe until the exact operation has been checked against the engine documentation and tested on data that behaves like production.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "databases",
        "sql",
        "reliability",
        "devops"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/debugging-methodically-a-checklist",
      "url": "https://soulstack.co.uk/blog/debugging-methodically-a-checklist",
      "title": "Debugging methodically: a checklist",
      "summary": "Debugging is faster when you treat it as controlled investigation. The aim is to move from symptom to cause with evidence, not to cycle through guesses until the failure disappear…",
      "content_html": "<p>Debugging is faster when you treat it as controlled investigation. The aim is to move from symptom to cause with evidence, not to cycle through guesses until the failure disappears.</p>\n<h2>State the failure precisely</h2>\n<p>Start by writing the failure in one sentence. Include what happened, what was expected, where it happened, and how often it happens.</p>\n<p>A weak statement is &quot;login is broken&quot;. A useful statement is &quot;password reset returns a 500 response after the token has expired, but only when the request includes a redirect parameter&quot;.</p>\n<p>The precise statement controls the investigation. If the statement changes, write the new version down. Many debugging sessions fail because the team silently switches between different bugs.</p>\n<h2>Reproduce the problem</h2>\n<p>A bug that cannot be reproduced can still be investigated, but it is harder to prove that it has been fixed. Try to create the smallest repeatable case.</p>\n<p>Check:</p>\n<ul>\n<li>Input data.</li>\n<li>Environment.</li>\n<li>Version.</li>\n<li>Configuration.</li>\n<li>Time and timezone.</li>\n<li>Network dependencies.</li>\n<li>Feature flags.</li>\n<li>User permissions.</li>\n<li>Browser, runtime, or operating system.</li>\n</ul>\n<p>When reproduction requires production data or timing, capture safe diagnostic facts rather than copying private data. Use synthetic data where possible.</p>\n<h2>Check the recent change set</h2>\n<p>Recent changes are not always the cause, but they are a useful starting point. Review code changes, dependency updates, configuration changes, data migrations, infrastructure changes, and scheduled jobs.</p>\n<p>Avoid anchoring on the first suspicious change. Treat it as a hypothesis, then prove or disprove it.</p>\n<h2>Read the error and stack trace fully</h2>\n<p>Read the first error, the final error, and the stack frames between them. Do not stop at the top line if it is a wrapper error. Do not ignore the caused-by chain if the platform provides one.</p>\n<p>A stack trace answers three questions:</p>\n<ul>\n<li>Where was the error raised?</li>\n<li>Which path reached that point?</li>\n<li>Which layer translated or swallowed the original failure?</li>\n</ul>\n<p>If the stack crosses framework or vendor code, find the last frame that belongs to the project. That is often the best place to inspect state.</p>\n<h2>Form one hypothesis at a time</h2>\n<p>A hypothesis should be testable. &quot;The cache is wrong&quot; is vague. &quot;The user profile cache key omits the region, so users with the same id in different regions can collide&quot; is testable.</p>\n<p>Write down the observation that would disprove the hypothesis. This guards against confirmation bias, the tendency to favour evidence that supports what you already believe.</p>\n<h2>Add the smallest useful instrumentation</h2>\n<p>Use the lowest impact tool that can answer the current question. That may be a log line, a breakpoint, a watch expression, a database query, a trace, or a metric.</p>\n<p>Use debuggers when you need runtime state. Breakpoints pause execution at a chosen line so you can inspect variables, evaluate expressions, and walk the call stack. Step controls help you follow the exact path. Watch expressions help when a value changes over time.</p>\n<p>Use logging when timing, concurrency, deployment, or remote execution makes a live debugger unsafe or impractical. Log facts, not guesses. Include identifiers that let related events be grouped, but avoid secrets and personal data.</p>\n<h2>Narrow the boundary</h2>\n<p>Find the smallest boundary where input is correct and output is wrong, then move the boundary inward.</p>\n<p>Common boundaries include:</p>\n<ul>\n<li>HTTP request and response.</li>\n<li>Function input and return value.</li>\n<li>Queue message production and consumption.</li>\n<li>Database read and write.</li>\n<li>Cache lookup and store.</li>\n<li>Serialisation and parsing.</li>\n<li>Third-party request and response.</li>\n</ul>\n<p>This method turns a large failure into a smaller one. Once the bad boundary is known, inspect only the code that can affect it.</p>\n<h2>Compare a passing case with a failing case</h2>\n<p>A diff between a passing and a failing case is often more useful than a large log. Compare inputs, headers, configuration, permissions, timestamps, dependency versions, data shape, and execution path.</p>\n<p>Keep the cases as similar as possible and change one variable at a time. If several variables differ, the comparison may prove nothing.</p>\n<h2>Be careful with time, state, and concurrency</h2>\n<p>Intermittent bugs often involve mutable state, time, scheduling, retries, caches, or concurrent access. Check whether the failure depends on order.</p>\n<p>Ask:</p>\n<ul>\n<li>Does it fail on the first run or only after warm-up?</li>\n<li>Does it fail after a cache entry expires?</li>\n<li>Does it fail around daylight saving changes or date boundaries?</li>\n<li>Does it fail under parallel execution?</li>\n<li>Does a retry hide the original error?</li>\n</ul>\n<p>For concurrency bugs, adding logs or breakpoints can change timing. Prefer targeted instrumentation and repeatable stress tests.</p>\n<h2>Prove the fix before cleaning up</h2>\n<p>A fix is proven when the failing reproduction passes and relevant existing tests still pass. Add a regression test when the behaviour should stay fixed.</p>\n<p>The regression test should fail before the fix and pass after it. If that is not practical, document why and add the closest reliable coverage.</p>\n<p>Do not delete diagnostic notes too early. They may be needed for the pull request description, incident review, or future debugging.</p>\n<h2>Write the conclusion</h2>\n<p>Close the loop with a short explanation:</p>\n<ul>\n<li>Symptom.</li>\n<li>Root cause.</li>\n<li>Fix.</li>\n<li>Test evidence.</li>\n<li>Follow-up risk if any.</li>\n</ul>\n<p>This prevents the same investigation from being repeated later. It also separates the real cause from the guesses that were explored along the way.</p>\n<h2>Conclusion</h2>\n<p>Methodical debugging is a discipline of precision. State the failure, reproduce it, inspect the evidence, test one hypothesis at a time, narrow the boundary, and prove the fix. The checklist is simple because the hard part is resisting guesses.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "reliability",
        "cli",
        "architecture"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/designing-a-clean-rest-api",
      "url": "https://soulstack.co.uk/blog/designing-a-clean-rest-api",
      "title": "Designing a clean REST API",
      "summary": "A clean REST API is boring in the best way: resources are easy to find, methods mean what HTTP says they mean, responses are predictable, and failures are clear enough for clients…",
      "content_html": "<p>A clean REST API is boring in the best way: resources are easy to find, methods mean what HTTP says they mean, responses are predictable, and failures are clear enough for clients to handle without guesswork.</p>\n<h2>Start with resources, not actions</h2>\n<p>Model the API around resources that have stable identities. Use plural nouns for collections and concrete identifiers for individual resources, such as <code>/orders</code> and <code>/orders/{orderId}</code>. Keep actions out of ordinary CRUD paths. A path like <code>/orders/{orderId}/cancel</code> can be valid when cancellation is a domain operation with its own rules, but it should not replace normal use of HTTP methods.</p>\n<p>Use names that match the domain language. Avoid leaking database tables, internal service names, implementation types, or workflow steps into public paths. Once a path is public, clients will build against it.</p>\n<h2>Use HTTP methods by their semantics</h2>\n<p>Use <code>GET</code> to read a resource or collection. <code>GET</code> is safe, so it must not change server state just because a client reads something. Use <code>POST</code> to create subordinate resources or run operations that are not idempotent. Use <code>PUT</code> when the client replaces a resource at a known URI. Use <code>PATCH</code> for partial updates when the patch format and merge rules are documented. Use <code>DELETE</code> to remove a resource or make it unavailable.</p>\n<p>Idempotency matters. HTTP defines <code>PUT</code>, <code>DELETE</code>, and the safe methods as idempotent, while <code>POST</code> and <code>PATCH</code> are neither safe nor idempotent. Idempotent does not mean every repeated request returns the same status code. It means the intended effect on the server is the same after one request or many identical requests. Design with that distinction in mind.</p>\n<h2>Make status codes specific but not clever</h2>\n<p>Return status codes that match the outcome. Use <code>200</code> when a response includes a successful representation, <code>201</code> when a resource was created, <code>202</code> when work was accepted but not finished, and <code>204</code> when the operation succeeded and no body is useful. Use <code>400</code> for invalid request syntax or shape, <code>401</code> for missing or invalid authentication, <code>403</code> for authenticated clients that are not allowed to act, <code>404</code> when the resource is not available to that client, and <code>409</code> for conflicts with the current state of the resource.</p>\n<p>Do not invent application success or failure codes inside a <code>200</code> response. Clients, proxies, SDKs, logs, tracing tools, and load balancers already understand HTTP status classes. Use them.</p>\n<h2>Standardise error bodies</h2>\n<p>Use a consistent error shape. For HTTP APIs, the problem details format is the current standard for machine-readable error responses. At minimum, include a stable error type, a short title, the HTTP status, a request-specific detail when safe to share, and an instance or correlation value that support can trace.</p>\n<p>Do not expose stack traces, SQL fragments, provider error dumps, secrets, internal hostnames, or private identifiers. Error messages are part of the API contract. They must help the caller fix the request without exposing the internals of the system.</p>\n<h2>Keep request and response shapes consistent</h2>\n<p>Use one naming convention for JSON fields and apply it everywhere. Avoid mixing <code>createdAt</code>, <code>created_at</code>, and <code>created</code> across the same API. Use date and time strings with an explicit offset or UTC marker for instants, following the internet timestamp profile of ISO 8601. State whether date-only fields are calendar dates rather than timestamps.</p>\n<p>Represent money as an object with amount and currency, or as the smallest currency unit with clear naming. Avoid floating point values for money. Represent booleans as booleans, not strings. Represent identifiers as strings unless the client is expected to perform numeric operations on them.</p>\n<h2>Design collections deliberately</h2>\n<p>Every collection that can grow should have pagination. Filtering and sorting must be documented as part of the collection contract, not added as ad hoc query parameters. Choose stable default ordering, because pagination without deterministic ordering produces duplicates and gaps when data changes.</p>\n<p>Keep collection items useful but bounded. A list response should include the fields needed to choose or display an item. It should not include every nested object by default. Use explicit expansion or follow-up requests when large related resources are needed.</p>\n<h2>Document the contract</h2>\n<p>Keep an OpenAPI description as part of the source, review it with the code, and generate or validate examples from it where possible. Each operation should document parameters, request bodies, response bodies, status codes, authentication requirements, rate limits, and error cases.</p>\n<p>Examples are part of the contract. Keep them small, realistic, and valid. A correct but abstract schema is not enough for a client engineer who needs to know what the API actually returns.</p>\n<h2>Version for compatibility, not convenience</h2>\n<p>A clean API changes without surprising clients. Additive changes are usually safe: new optional request fields, new response fields, new endpoints, and new enum values when clients are told to ignore unknown values. Breaking changes need a versioning and deprecation plan.</p>\n<p>Do not publish internal refactors as API versions. A version should exist because the client contract changed, not because the server implementation changed.</p>\n<h2>Conclusion</h2>\n<p>A clean REST API is a stable contract over HTTP. Good resource names, correct method semantics, specific status codes, consistent error bodies, deliberate collection design, and accurate documentation reduce integration work for every client. The best API design is not flashy. It is predictable under success, failure, growth, and change.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "api",
        "architecture",
        "web"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/designing-for-graceful-degradation",
      "url": "https://soulstack.co.uk/blog/designing-for-graceful-degradation",
      "title": "Designing for graceful degradation",
      "summary": "Graceful degradation means a system continues to provide reduced but useful behaviour when part of it is slow, overloaded, or unavailable. It is not the same as hiding failure. Th…",
      "content_html": "<p>Graceful degradation means a system continues to provide reduced but useful behaviour when part of it is slow, overloaded, or unavailable. It is not the same as hiding failure. The user, operator, or caller should still get a clear signal where correctness, freshness, or completeness has changed.</p>\n<h2>Start with the critical path</h2>\n<p>List what must work for the most important user journeys. Then identify which dependencies are required, which are optional, and which can be deferred. This is a design decision, not an incident response improvisation.</p>\n<p>A checkout flow might require product identity, price, stock reservation, payment, and order creation. It might not require recommendations, analytics, marketing tags, or a personalised banner. A read page might serve cached data when a ranking service is unavailable, but an account deletion flow should not fake success if the deletion did not happen.</p>\n<h2>Define acceptable reduction</h2>\n<p>Degradation has to be explicit. Examples include serving cached content, reducing result quality, disabling optional widgets, lowering image quality, delaying non-critical notifications, using a simpler ranking algorithm, or switching to a read-only mode.</p>\n<p>Each reduction needs a correctness boundary. Cached data might be acceptable for documentation, product descriptions, or public content. It might be unacceptable for balances, permissions, legal notices, or inventory that drives purchasing decisions.</p>\n<h2>Timeouts and budgets</h2>\n<p>A dependency without a timeout can consume the entire request budget. Set timeouts per dependency and keep them shorter than the caller&#39;s total deadline. A timeout should leave enough time to return a fallback or a useful error.</p>\n<p>Budgets also apply to retries. A retry that exceeds the user&#39;s request deadline is wasted work. During overload, retries can increase traffic and make the failure worse. Use bounded retries, backoff, jitter, and retry budgets.</p>\n<h2>Isolation</h2>\n<p>A failing optional dependency should not exhaust shared resources needed by critical paths. Use separate connection pools, bulkheads, queues, thread pools, or rate limits where appropriate. Isolation keeps a slow analytics call from consuming the same resources needed for login, checkout, or health checks.</p>\n<p>Circuit breakers can stop repeated calls to a dependency that is already failing. They should be paired with observability and careful recovery behaviour. A circuit breaker that opens silently can turn a short dependency issue into a prolonged feature outage.</p>\n<h2>Load shedding</h2>\n<p>When demand exceeds capacity, refusing some work can protect the system as a whole. Load shedding should happen as early as possible and should prefer low value, expensive, or retryable work over critical work.</p>\n<p>Return clear responses. For HTTP APIs, use status codes and headers that let clients distinguish overload from validation failure. For internal systems, propagate a structured error so callers can decide whether to retry, fall back, or fail.</p>\n<h2>Fallbacks</h2>\n<p>A fallback is production code and must be tested like production code. A stale cache, default response, alternate provider, or simplified algorithm can be wrong, slow, or unavailable too.</p>\n<p>Avoid fallbacks that create hidden data corruption. If a fraud check is unavailable, the safe fallback might be manual review or deferred fulfilment, not blind approval. If a permission service is unavailable, the safe fallback is usually deny or read-only, not allow.</p>\n<h2>User experience</h2>\n<p>Graceful degradation should be visible where it changes user expectations. A user can tolerate a missing recommendation panel. They need a clear message if a report is delayed, data is stale, or a write action cannot be confirmed.</p>\n<p>Do not show success before the system has accepted responsibility for the operation. For asynchronous work, show a pending state and provide a way to refresh or inspect the outcome.</p>\n<h2>Observability</h2>\n<p>Operators need to see degradation as a first-class state. Track fallback rates, circuit breaker state, cache staleness, timeout rates, shed load, retry rates, and dependency latency. Alert on sustained degradation even when the top-level service is still returning successful responses.</p>\n<p>Logs should include safe correlation identifiers and the chosen degradation path. Metrics should distinguish full success from degraded success. Otherwise the system can look healthy while users receive reduced behaviour.</p>\n<h2>Testing</h2>\n<p>Test degraded modes before incidents. Use fault injection, dependency timeouts, disabled features, load tests, and deployment drills. Verify that the fallback path does not call the same failing dependency indirectly.</p>\n<p>Runbook entries should describe how to enable, disable, and verify degraded modes. Feature flags can help, but they need ownership, default states, audit trails, and cleanup. A forgotten flag is technical debt and operational risk.</p>\n<h2>Trade-offs</h2>\n<p>Graceful degradation can increase complexity. Every fallback is another path to build, test, secure, observe, and maintain. Use it where the business value justifies the cost. Critical read paths, high traffic user journeys, and expensive dependencies are common candidates.</p>\n<p>The strongest design is often to remove optional work from the critical path before adding complex fallback logic. Deferred processing, cached precomputation, and simpler dependency graphs can reduce the need for degradation during incidents.</p>\n<h2>Conclusion</h2>\n<p>Graceful degradation is a reliability design technique, not a slogan. Decide which behaviour can be reduced, isolate critical paths, set timeouts and budgets, make fallbacks safe, and observe degraded success separately from full success. A degraded system should be honest, useful, and recoverable.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "architecture",
        "reliability",
        "performance"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/dns-for-developers-records-ttls-and-propagation",
      "url": "https://soulstack.co.uk/blog/dns-for-developers-records-ttls-and-propagation",
      "title": "DNS for developers: records, TTLs and propagation",
      "summary": "DNS maps names to the records that applications and infrastructure need. For most developers the work comes down to three things: choosing the right record type, setting a sensibl…",
      "content_html": "<p>DNS maps names to the records that applications and infrastructure need. For most developers the work comes down to three things: choosing the right record type, setting a sensible TTL, and understanding why a change does not appear everywhere at the same instant.</p>\n<h2>DNS is a distributed database</h2>\n<p>A resolver starts with a name and asks the DNS hierarchy for the answer. The final answer normally comes from the authoritative name servers for the zone that holds the record.</p>\n<p>Applications rarely query authoritative servers directly. They ask a recursive resolver, which performs the lookup on their behalf and caches the answer for reuse. Almost everything that feels confusing about DNS comes back to that caching layer.</p>\n<h2>Zones and authoritative servers</h2>\n<p>A zone is an administrative slice of the DNS namespace. Authoritative name servers hold the records for that zone. When a domain is delegated, records in the parent zone point resolvers towards the authoritative name servers for the child zone.</p>\n<p>For day to day work the important question is which provider is authoritative for the domain or subdomain you are editing. Changing records in the wrong provider has no effect on the answers users actually receive.</p>\n<h2>Common record types</h2>\n<p>A records map a name to an IPv4 address. AAAA records map a name to an IPv6 address. CNAME records make one name an alias of another name. MX records identify mail exchangers. TXT records hold text values and are commonly used for domain verification and email authentication.</p>\n<p>CNAME records need care. If a CNAME record is present at a name, no other data should be present at that same name. The record is an alias, so it cannot sit alongside A, MX or TXT records on the same owner name. Many providers offer CNAME-like convenience at the zone apex, but those are provider features rather than plain CNAME records.</p>\n<h2>TTL controls cache lifetime</h2>\n<p>TTL means time to live. It is a value in seconds that tells resolvers how long they may cache a DNS answer. A TTL of 300 allows caching for five minutes. A TTL of 86400 allows caching for 24 hours.</p>\n<p>The TTL is a ceiling, not a promise. A resolver may discard an answer sooner, but it should not keep serving the cached answer once the TTL has elapsed. Lower TTLs let planned changes take effect sooner once caches refresh. Higher TTLs reduce lookup volume and can improve resilience when authoritative DNS is briefly unreachable.</p>\n<h2>Propagation is mostly caching</h2>\n<p>People say DNS is propagating, but most of the waiting is resolver cache expiry. After an authoritative record changes, resolvers that already cached the old answer can keep returning it until the old TTL expires.</p>\n<p>Some resolvers cap or floor the TTL values they honour. Negative answers, such as a name that does not exist, can also be cached for a period controlled by the zone. That is why two users can see different answers during a change window.</p>\n<h2>Plan changes before you make them</h2>\n<p>For a planned migration, lower the TTL well before the cutover. Wait at least as long as the previous TTL so that caches have had time to pick up the lower value. Then make the record change. After the new target is stable, raise the TTL again if a longer value suits the record.</p>\n<p>This does not make a change instant, but it caps the time that well behaved caches should keep the old answer.</p>\n<h2>Apex records need provider awareness</h2>\n<p>The zone apex is the bare domain, such as example.com. A plain CNAME is not valid at the apex, because the apex must carry the records that describe the zone, and a CNAME cannot coexist with other data at the same name. Many providers support ALIAS, ANAME, CNAME flattening or similar features to point an apex at another name.</p>\n<p>These features are provider specific. Read the provider documentation and confirm what records are actually returned to resolvers.</p>\n<h2>DNS is not health checking by default</h2>\n<p>A normal A or AAAA record does not know whether the target service is healthy. It points to the configured address until the record changes and caches expire.</p>\n<p>Some managed DNS providers offer health checked or load balanced records. Those are service features layered on top of DNS, and they still interact with TTLs and resolver caching.</p>\n<h2>Debugging DNS</h2>\n<p>Start by querying the authoritative name servers. That tells you what the source of truth currently says. Then query a public recursive resolver and the resolver used by the affected environment, and compare answers, TTLs and record types.</p>\n<p>Check delegation too. If the parent zone points to different name servers than the ones you are editing, your changes will not be visible to normal resolvers.</p>\n<p>Use exact names. A missing trailing dot in a provider UI, an unexpected subdomain, or editing www when the application uses the apex can all make a correct record look broken.</p>\n<h2>Conclusion</h2>\n<p>DNS changes are reliable when you know the authoritative provider, use the right record type and plan around TTLs. Treat propagation as cache expiry rather than magic. For migrations, lower the TTL in advance, verify the authoritative answer, and test from the resolvers your users actually use.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "web",
        "reliability",
        "devops"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/end-of-shift-left-security-belongs-everywhere-now",
      "url": "https://soulstack.co.uk/blog/end-of-shift-left-security-belongs-everywhere-now",
      "title": "The end of shift left: security belongs everywhere now",
      "summary": "Shift left was useful because it challenged the idea that security starts after code is written. It becomes harmful when teams interpret it as moving all security work onto develo…",
      "content_html": "<p>Shift left was useful because it challenged the idea that security starts after code is written. It becomes harmful when teams interpret it as moving all security work onto developers and calling the job done. Security does not belong on the left. It belongs throughout the software lifecycle.</p>\n<h2>Shift left solved the wrong part alone</h2>\n<p>Finding a vulnerability earlier is usually cheaper than finding it later. That principle still holds. The problem is that early checks only cover what can be known early.</p>\n<p>A dependency may be safe when merged and vulnerable later. A secret may be introduced through a later commit. A runtime path may only become risky under production traffic. A misconfigured identity policy may not be visible from application code. A model generated change may look harmless until it is combined with a workflow token, a deployment permission or privileged automation.</p>\n<p>Security therefore needs stages, not slogans.</p>\n<h2>Developers need defaults, not lectures</h2>\n<p>Experienced engineers do not need another poster telling them to care about security. They need secure defaults that are easier than the insecure alternatives.</p>\n<p>That means dependency review in pull requests, secret scanning across history and branches, code scanning that reports inside the developer workflow, templates that set least privilege permissions, and deployment policies that make risky changes visible before production.</p>\n<p>Security controls should produce useful feedback at the point of action. A scanner that reports hundreds of low value findings after a release will be ignored. A check that blocks a dangerous dependency change in the pull request has a clear owner and a clear moment for action.</p>\n<h2>Runtime still matters</h2>\n<p>Preventive controls are necessary, but they do not remove the need for detection and response. Production systems need logging, alerting, vulnerability management, identity review, incident response and recovery practice.</p>\n<p>A secure pipeline cannot prove that a running system is safe. It can only reduce known risks before deployment. Runtime controls catch drift, abuse, newly disclosed vulnerabilities, exposed credentials and behaviour that was not visible in source review.</p>\n<p>The useful model is continuous assurance. Design reviews catch architectural risk. Pull request checks catch change risk. Deployment gates catch release risk. Runtime telemetry catches operational risk.</p>\n<h2>Ownership must be explicit</h2>\n<p>Security fails when every team assumes another team is handling it. A platform team may own secure deployment primitives. A security team may own policy and assurance. Product teams may own service level remediation. Operations may own incident response. None of that works unless responsibility is written down.</p>\n<p>Every control should have an owner, a failure mode and an escalation path. If a dependency alert fires, who triages it? If a secret is found, who rotates it? If a deployment violates policy, who can approve an exception? If nobody can answer, the control is theatre.</p>\n<h2>Security work should be prioritised like reliability work</h2>\n<p>Security findings compete with feature work. Pretending otherwise creates queues that never clear. Teams need severity rules, service ownership, deadlines and a way to distinguish exploitable risk from noise.</p>\n<p>The goal is not to maximise findings. The goal is to reduce meaningful risk while preserving delivery. That requires tuning, suppression with accountability, and regular review of whether controls are catching real problems.</p>\n<h2>Conclusion</h2>\n<p>Shift left was never enough. Modern security belongs in design, code review, dependency management, CI, deployment, runtime and incident response. The best security programmes do not move work left. They put the right control at the right point, with a named owner and feedback that engineers can act on.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "security",
        "devops",
        "architecture"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/finding-things-fast-grep-find-ripgrep-and-fd",
      "url": "https://soulstack.co.uk/blog/finding-things-fast-grep-find-ripgrep-and-fd",
      "title": "Finding things fast: grep, find, ripgrep and fd",
      "summary": "Fast search starts with choosing the right question. Use grep and ripgrep when you are searching file contents. Use find and fd when you are searching paths, file names or file me…",
      "content_html": "<p>Fast search starts with choosing the right question. Use <code>grep</code> and <code>ripgrep</code> when you are searching file contents. Use <code>find</code> and <code>fd</code> when you are searching paths, file names or file metadata. Get that split right and most search problems become easy.</p>\n<h2>Use grep for dependable text matching</h2>\n<p><code>grep</code> prints lines that match a pattern. It is installed almost everywhere, which makes it a good default for scripts and portable documentation.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">grep</span><span style=\"color:#A5D6FF\"> 'listen'</span><span style=\"color:#A5D6FF\"> nginx.conf</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">grep</span><span style=\"color:#79C0FF\"> -R</span><span style=\"color:#A5D6FF\"> 'TODO'</span><span style=\"color:#A5D6FF\"> src</span></span></code></pre></div><p>Use <code>-n</code> for line numbers and <code>-i</code> for case-insensitive matching.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">grep</span><span style=\"color:#79C0FF\"> -Rni</span><span style=\"color:#A5D6FF\"> 'deprecated'</span><span style=\"color:#A5D6FF\"> src</span></span></code></pre></div><p>Use <code>-F</code> when the pattern is fixed text, not a regular expression. This avoids regex interpretation for characters such as <code>.</code> or <code>[</code>.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">grep</span><span style=\"color:#79C0FF\"> -RF</span><span style=\"color:#A5D6FF\"> 'user.name'</span><span style=\"color:#A5D6FF\"> config</span></span></code></pre></div><p>Use <code>--</code> before paths or patterns that may begin with a hyphen, so they are not mistaken for options.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">grep</span><span style=\"color:#79C0FF\"> --</span><span style=\"color:#A5D6FF\"> '-Xmx'</span><span style=\"color:#A5D6FF\"> jvm.options</span></span></code></pre></div><h2>Use find for precise file selection</h2>\n<p><code>find</code> walks directory trees and evaluates expressions against each path. It is the right tool when you care about file type, name, size, time or permissions.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">find</span><span style=\"color:#A5D6FF\"> .</span><span style=\"color:#79C0FF\"> -type</span><span style=\"color:#A5D6FF\"> f</span><span style=\"color:#79C0FF\"> -name</span><span style=\"color:#A5D6FF\"> '*.log'</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">find</span><span style=\"color:#A5D6FF\"> .</span><span style=\"color:#79C0FF\"> -type</span><span style=\"color:#A5D6FF\"> f</span><span style=\"color:#79C0FF\"> -size</span><span style=\"color:#A5D6FF\"> +10M</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">find</span><span style=\"color:#A5D6FF\"> .</span><span style=\"color:#79C0FF\"> -type</span><span style=\"color:#A5D6FF\"> d</span><span style=\"color:#79C0FF\"> -name</span><span style=\"color:#A5D6FF\"> node_modules</span></span></code></pre></div><p>Use <code>-exec ... {} +</code> to pass many matched paths to a command in as few invocations as possible.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">find</span><span style=\"color:#A5D6FF\"> .</span><span style=\"color:#79C0FF\"> -type</span><span style=\"color:#A5D6FF\"> f</span><span style=\"color:#79C0FF\"> -name</span><span style=\"color:#A5D6FF\"> '*.md'</span><span style=\"color:#79C0FF\"> -exec</span><span style=\"color:#A5D6FF\"> wc</span><span style=\"color:#79C0FF\"> -l</span><span style=\"color:#A5D6FF\"> {}</span><span style=\"color:#A5D6FF\"> +</span></span></code></pre></div><p>Avoid parsing <code>find</code> output with whitespace-delimited loops. File names can contain spaces, tabs and newlines. Use <code>-print0</code> with tools that accept null-delimited input.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">find</span><span style=\"color:#A5D6FF\"> .</span><span style=\"color:#79C0FF\"> -type</span><span style=\"color:#A5D6FF\"> f</span><span style=\"color:#79C0FF\"> -name</span><span style=\"color:#A5D6FF\"> '*.log'</span><span style=\"color:#79C0FF\"> -print0</span><span style=\"color:#FF7B72\"> |</span><span style=\"color:#FFA657\"> xargs</span><span style=\"color:#79C0FF\"> -0</span><span style=\"color:#A5D6FF\"> gzip</span></span></code></pre></div><p>When the command you are running can be expressed with <code>find</code> itself, prefer that. For example, deletion can be handled directly.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">find</span><span style=\"color:#A5D6FF\"> .</span><span style=\"color:#79C0FF\"> -type</span><span style=\"color:#A5D6FF\"> f</span><span style=\"color:#79C0FF\"> -name</span><span style=\"color:#A5D6FF\"> '*.tmp'</span><span style=\"color:#79C0FF\"> -delete</span></span></code></pre></div><p>Review destructive <code>find</code> commands with <code>-print</code> before replacing it with <code>-delete</code>. Note that <code>-print0</code>, <code>-delete</code> and <code>-exec ... {} +</code> are widely available but go beyond the base POSIX predicate set, so confirm support if you target unusual systems.</p>\n<h2>Use ripgrep for code search</h2>\n<p><code>ripgrep</code>, usually run as <code>rg</code>, is designed for recursive text search. By default it respects ignore files such as <code>.gitignore</code> when it detects a source control repository, and it skips hidden files and binary files.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">rg</span><span style=\"color:#A5D6FF\"> 'createUser'</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">rg</span><span style=\"color:#79C0FF\"> -n</span><span style=\"color:#A5D6FF\"> 'createUser'</span><span style=\"color:#A5D6FF\"> src</span></span></code></pre></div><p>Use <code>-S</code> for smart case. It searches case-insensitively when the pattern is all lower case and case-sensitively when the pattern contains an uppercase letter.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">rg</span><span style=\"color:#79C0FF\"> -S</span><span style=\"color:#A5D6FF\"> 'userid'</span></span></code></pre></div><p>Use <code>-g</code> to include or exclude globs.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">rg</span><span style=\"color:#A5D6FF\"> 'timeout'</span><span style=\"color:#79C0FF\"> -g</span><span style=\"color:#A5D6FF\"> '*.ts'</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">rg</span><span style=\"color:#A5D6FF\"> 'timeout'</span><span style=\"color:#79C0FF\"> -g</span><span style=\"color:#A5D6FF\"> '!dist'</span></span></code></pre></div><p>Use <code>--hidden</code> when dotfiles matter. The <code>-u</code> flag relaxes filtering in stages: <code>-u</code> stops respecting ignore files, <code>-uu</code> also searches hidden files, and <code>-uuu</code> also searches binary files. Reach for these only when you intentionally want to widen the search.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">rg</span><span style=\"color:#79C0FF\"> --hidden</span><span style=\"color:#A5D6FF\"> 'theme'</span><span style=\"color:#A5D6FF\"> ~/.config</span></span></code></pre></div><p>For code review, <code>rg</code> is often faster and quieter than <code>grep -R</code> because the default ignore behaviour matches what developers usually mean by project search.</p>\n<h2>Use fd for friendly path search</h2>\n<p><code>fd</code> is a user-friendly alternative to <code>find</code> for common path searches. It does not aim to implement every <code>find</code> feature. Its defaults are designed for interactive use.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">fd</span><span style=\"color:#A5D6FF\"> config</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">fd</span><span style=\"color:#A5D6FF\"> '\\.md$'</span></span></code></pre></div><p>Like <code>ripgrep</code>, <code>fd</code> respects ignore files by default when searching inside a git repository, and it skips hidden files unless requested otherwise.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">fd</span><span style=\"color:#79C0FF\"> --hidden</span><span style=\"color:#A5D6FF\"> ssh</span><span style=\"color:#A5D6FF\"> ~/.config</span></span></code></pre></div><p>Use <code>-e</code> to match file extensions and <code>-x</code> to run a command for each result.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">fd</span><span style=\"color:#79C0FF\"> -e</span><span style=\"color:#A5D6FF\"> md</span><span style=\"color:#79C0FF\"> -x</span><span style=\"color:#A5D6FF\"> wc</span><span style=\"color:#79C0FF\"> -l</span></span></code></pre></div><p>Use <code>find</code> instead of <code>fd</code> in scripts that need POSIX-style availability or advanced predicates. Use <code>fd</code> interactively when its defaults match the question.</p>\n<h2>Combine tools by responsibility</h2>\n<p>Use the path tool to choose files, then the content tool to inspect them.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">find</span><span style=\"color:#A5D6FF\"> .</span><span style=\"color:#79C0FF\"> -type</span><span style=\"color:#A5D6FF\"> f</span><span style=\"color:#79C0FF\"> -name</span><span style=\"color:#A5D6FF\"> '*.ts'</span><span style=\"color:#79C0FF\"> -exec</span><span style=\"color:#A5D6FF\"> grep</span><span style=\"color:#79C0FF\"> -n</span><span style=\"color:#A5D6FF\"> 'TODO'</span><span style=\"color:#A5D6FF\"> {}</span><span style=\"color:#A5D6FF\"> +</span></span></code></pre></div><p>For interactive project search, <code>rg</code> can often do both selection and matching.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">rg</span><span style=\"color:#A5D6FF\"> 'TODO'</span><span style=\"color:#79C0FF\"> -g</span><span style=\"color:#A5D6FF\"> '*.ts'</span></span></code></pre></div><p>For exact path selection followed by fast content search, combine null-delimited streams where supported.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">fd</span><span style=\"color:#79C0FF\"> -0</span><span style=\"color:#79C0FF\"> -e</span><span style=\"color:#A5D6FF\"> ts</span><span style=\"color:#FF7B72\"> |</span><span style=\"color:#FFA657\"> xargs</span><span style=\"color:#79C0FF\"> -0</span><span style=\"color:#A5D6FF\"> rg</span><span style=\"color:#79C0FF\"> -n</span><span style=\"color:#A5D6FF\"> 'TODO'</span></span></code></pre></div><p>Check each tool&#39;s null-delimited options before using this pattern. Correct delimiter handling matters more than a shorter command.</p>\n<h2>Make searches reproducible</h2>\n<p>Document whether hidden files, ignored files and binary files are included. Many search disagreements come from different defaults, not different data.</p>\n<p>Good search commands make scope visible.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">rg</span><span style=\"color:#79C0FF\"> --hidden</span><span style=\"color:#79C0FF\"> -g</span><span style=\"color:#A5D6FF\"> '!node_modules'</span><span style=\"color:#A5D6FF\"> 'api_key'</span><span style=\"color:#A5D6FF\"> .</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">find</span><span style=\"color:#A5D6FF\"> .</span><span style=\"color:#79C0FF\"> -type</span><span style=\"color:#A5D6FF\"> f</span><span style=\"color:#79C0FF\"> -name</span><span style=\"color:#A5D6FF\"> '*.pem'</span><span style=\"color:#79C0FF\"> -print</span></span></code></pre></div><p>Use fixed strings when searching for literal tokens. Use regular expressions when pattern syntax is needed. Mixing the two leads to false positives and missed matches.</p>\n<h2>Conclusion</h2>\n<p>Use <code>grep</code> for portable text matching, <code>find</code> for precise filesystem predicates, <code>ripgrep</code> for fast project search and <code>fd</code> for friendly interactive path search. The fastest search is not just the quickest command, it is the command whose scope and matching rules are clear.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "cli",
        "devops",
        "performance"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/finops-for-engineers-cutting-cloud-waste-without-killing-velocity",
      "url": "https://soulstack.co.uk/blog/finops-for-engineers-cutting-cloud-waste-without-killing-velocity",
      "title": "FinOps for engineers: cutting cloud waste without killing velocity",
      "summary": "Cloud cost work fails when engineers only see it as finance asking them to spend less. Useful FinOps is different. It gives engineers timely cost data, connects spend to technical…",
      "content_html": "<p>Cloud cost work fails when engineers only see it as finance asking them to spend less. Useful FinOps is different. It gives engineers timely cost data, connects spend to technical decisions and helps teams remove waste without slowing delivery.</p>\n<h2>Cost is an engineering signal</h2>\n<p>Latency, error rate and saturation are normal engineering signals. Cost should be treated the same way. A service that becomes cheaper without losing reliability has improved. A service that becomes expensive because of retries, inefficient storage, oversized compute or uncontrolled data transfer is signalling a technical problem.</p>\n<p>The key is context. A monthly bill at account level is too late and too broad. Engineers need cost visibility by service, environment, team and workload, where the tagging and allocation model can support it.</p>\n<h2>Separate usage from rate</h2>\n<p>Cloud spend is shaped by both usage and rate. Usage is what systems consume: compute hours, storage, requests, data transfer and managed service capacity. Rate is what each unit costs after pricing model, discounts and commitments.</p>\n<p>Engineers usually have the most direct influence over usage. Finance and platform teams may have more influence over commitments, discounts and commercial terms. Mixing the two creates confusion. A product team should not be blamed for missing a purchasing discount, and finance cannot optimise a wasteful architecture from a spreadsheet.</p>\n<h2>Do not optimise blind</h2>\n<p>Cost reduction without service context is risky. Turning down capacity may save money and damage availability. Removing logs may reduce storage and damage incident response. Aggressive scaling may reduce idle time and increase latency.</p>\n<p>Good FinOps work starts with unit economics: cost per customer, transaction, request, build, environment or other meaningful unit. The right unit depends on the product. The point is to connect cost to value rather than treat all spend as equal.</p>\n<h2>Put guardrails in the platform</h2>\n<p>Engineers should not need to remember every cost rule. The platform can provide defaults for resource limits, autoscaling, retention, environment expiry, storage classes and budget alerts.</p>\n<p>Guardrails should prevent obvious waste while preserving legitimate exceptions. A development environment that never expires is usually waste. A production database with extra capacity during a launch may be justified. The system should make both visible.</p>\n<h2>Make waste easy to remove</h2>\n<p>FinOps backlogs fail when recommendations are vague. A useful recommendation names the service, the owner, the likely action, the expected saving, the risk and the validation step.</p>\n<p>For example: reduce a non production retention period, delete unattached storage, right size a consistently idle workload, move infrequently accessed data to a cheaper class, or fix a retry storm. Each action should be small enough to review and reversible where possible.</p>\n<h2>Conclusion</h2>\n<p>FinOps for engineers is not a cost cutting campaign. It is an operating model for making cost visible, attributable and actionable. The best programmes reduce waste by improving engineering feedback loops, not by creating approval queues that slow teams down.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "devops",
        "architecture",
        "reliability"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/from-copilot-to-coworker-how-ai-agents-change-pull-requests",
      "url": "https://soulstack.co.uk/blog/from-copilot-to-coworker-how-ai-agents-change-pull-requests",
      "title": "From copilot to coworker: how AI agents change pull requests",
      "summary": "AI coding tools are moving from inline suggestions to agents that open pull requests on their own. That shift changes what a pull request is. It stops being a record of human work…",
      "content_html": "<p>AI coding tools are moving from inline suggestions to agents that open pull requests on their own. That shift changes what a pull request is. It stops being a record of human work and becomes a boundary where a human, an agent and the delivery system meet, and where responsibility for the change has to land somewhere.</p>\n<h2>The pull request becomes the work contract</h2>\n<p>When an agent opens a pull request, the description carries more weight than usual. It should state the task, the acceptance criteria, the files changed, the validation performed and any known limitations. A reviewer should not have to reverse engineer the goal from the diff, because the agent cannot be questioned the way a colleague can.</p>\n<p>The pull request is also where accountability returns to the team. An agent can propose work. A human and the organisation decide whether that work is acceptable, and they own the result once it merges. Nothing about generated code changes that, so the pull request has to make the ownership explicit rather than implied.</p>\n<h2>Evidence matters more than summaries</h2>\n<p>A generated change can read as coherent while missing context the model never had. The pull request should carry evidence that is hard to fabricate: test results, build logs, dependency review and links to the issues the change is meant to address. That evidence is what a reviewer signs off against.</p>\n<p>A summary is a navigation aid, not proof. It can point a reviewer at the parts of the diff worth reading closely, but it cannot stand in for reading the risky parts. The more fluent the summary, the more important this distinction becomes.</p>\n<h2>CI becomes the second reviewer</h2>\n<p>As agent output grows, continuous integration has to become more targeted and more trustworthy, because it is the check that scales when human attention does not. Syntax and lint checks are the floor. The pipeline should validate the things that actually break in production, such as contracts, migrations, dependencies and security policy, scaled to the risk of the change.</p>\n<p>This is not an argument for endless gates. It is an argument that an agent should not be able to create the impression of progress while bypassing the checks that matter. If a check can be skipped quietly, it is not really protecting anything.</p>\n<h2>Permissions should match the task</h2>\n<p>An agent that edits code does not need access to secrets, production environments or privileged workflows by default. Least privilege, short lived access and human approval for sensitive actions keep the blast radius small if the agent behaves unexpectedly. Tokens scoped to the job and valid for minutes are safer than long lived credentials sitting in the environment.</p>\n<p>Treat repository comments, issue bodies and pull request descriptions as untrusted input whenever they steer agent behaviour. Text from these sources can carry instructions the agent was never meant to follow, so an agent workflow should never turn arbitrary text into privileged execution.</p>\n<h2>Review culture will change</h2>\n<p>Reviewing agent output is a different skill from reviewing a colleague. The useful questions become: what did the agent optimise for, what did it leave out, and what could fail once this is running? You cannot ask the author to explain a trade off from memory, so the review has to extract that understanding from the change and its evidence.</p>\n<p>Teams will also need vocabulary for generated work. Labels and policies such as agent authored, human edited, requires security review, requires migration review or safe to merge after checks let a team route a change to the right level of scrutiny instead of treating every pull request the same.</p>\n<h2>Conclusion</h2>\n<p>AI agents do not make pull requests obsolete. They make them more important. The pull request becomes the place where generated work is made legible, validated and owned, and where a proposal turns into something a team is willing to stand behind. The teams that adapt will treat agent output as a proposal backed by evidence, not as finished work waiting for a rubber stamp.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "ai",
        "git",
        "devops",
        "security"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/hashing-vs-encrypting-vs-encoding",
      "url": "https://soulstack.co.uk/blog/hashing-vs-encrypting-vs-encoding",
      "title": "Hashing vs encrypting vs encoding",
      "summary": "Hashing, encrypting and encoding are often grouped together, but they solve different problems. Mixing them up leads to weak designs, especially when handling passwords, tokens, k…",
      "content_html": "<p>Hashing, encrypting and encoding are often grouped together, but they solve different problems. Mixing them up leads to weak designs, especially when handling passwords, tokens, keys or personal data. This post is about the conceptual difference between the three, not the step by step of how to store a password.</p>\n<h2>The short version</h2>\n<p>Hashing creates a fixed length digest from input data. A secure cryptographic hash is designed to be one way. You can hash the same input again and compare the result, but you cannot turn the digest back into the original value.</p>\n<p>Encryption protects data so it can be recovered later by someone with the right key. The original data is plaintext, the protected result is ciphertext, and decryption turns the ciphertext back into plaintext.</p>\n<p>Encoding changes the representation of data so another system can carry or store it. It is not a security control. Anyone who knows the encoding can reverse it.</p>\n<h2>Hashing is for integrity and comparison</h2>\n<p>A hash is useful when you need to compare values without storing the original, or when you need to check that data has not changed. The same input always produces the same digest, and any change to the input produces a different one.</p>\n<p>Common uses include file integrity checks, content addressing, digital signatures and password verification. The exact requirements differ by use case. A general purpose hash such as SHA-256 produces a fixed length 256-bit digest and is suitable for integrity checks, but it is not suitable by itself for password storage because it is fast. For passwords you need a purpose built, deliberately slow password hashing algorithm. That is its own topic.</p>\n<h2>Encryption is for confidentiality</h2>\n<p>Encryption is the right tool when authorised software must read the original value later. Examples include payment tokens, private notes, backups, session state and sensitive database fields.</p>\n<p>Encryption depends on key management. If the key sits beside the ciphertext with the same access controls, the design has not provided useful protection. Treat encryption keys as secrets: restrict access to them, rotate them when needed and audit their use.</p>\n<p>Use authenticated encryption where available. Authenticated encryption provides confidentiality and lets the recipient check integrity and authenticity, so tampering is detected rather than silently accepted. Without integrity protection, an attacker may be able to change ciphertext and influence the decrypted result.</p>\n<h2>Encoding is for transport and compatibility</h2>\n<p>Encoding makes data safe for a specific format or protocol. Base64 can represent binary data as text. URL encoding can put reserved characters into a URL component. HTML escaping can represent special characters safely in markup.</p>\n<p>Encoding does not hide the value. A Base64 string may look unreadable, but it is not protected. It is only represented differently, and anyone can decode it.</p>\n<h2>The common mistakes</h2>\n<p>Do not call Base64 encryption. It provides no secrecy.</p>\n<p>Do not encrypt passwords for normal login systems. If a password can be decrypted, a stolen key can expose every password at once. Store password verifiers with a slow password hashing algorithm instead.</p>\n<p>Do not use a fast hash for passwords. Fast hashes help attackers test guesses quickly after a database leak.</p>\n<p>Do not invent a custom cryptographic format. Use maintained libraries and standard algorithms. Cryptography fails easily when nonce handling, key length, authentication or error handling is wrong.</p>\n<h2>Choosing the right tool</h2>\n<p>Use hashing when the original value should not be recovered and you only need verification or integrity.</p>\n<p>Use encryption when authorised code must recover the original value later.</p>\n<p>Use encoding when data must fit a transport or storage format and no secrecy is required.</p>\n<p>The choice is not about what looks scrambled. It is about whether the operation is one way, key protected, or just a different representation.</p>\n<h2>Conclusion</h2>\n<p>Hashing, encryption and encoding are separate tools. Hashing verifies without recovery. Encryption protects data that must be recovered. Encoding makes data fit another format. Treating encoding as security, or using ordinary encryption where password hashing is required, creates avoidable risk.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "security",
        "web",
        "architecture"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/health-checks-and-graceful-shutdown",
      "url": "https://soulstack.co.uk/blog/health-checks-and-graceful-shutdown",
      "title": "Health checks and graceful shutdown",
      "summary": "Health checks and graceful shutdown protect availability during deploys, scaling, dependency failures, and node maintenance. They work only when they describe the real state of th…",
      "content_html": "<p>Health checks and graceful shutdown protect availability during deploys, scaling, dependency failures, and node maintenance. They work only when they describe the real state of the process.</p>\n<h2>Health checks have different jobs</h2>\n<p>A single health endpoint cannot answer every operational question. Readiness, liveness, and startup checks exist for different reasons.</p>\n<p>A readiness check says whether the process should receive traffic. It should fail when the instance cannot serve useful work, even if the process is still running.</p>\n<p>A liveness check says whether the process is stuck and should be restarted. It should fail only when restart is the right recovery action.</p>\n<p>A startup check gives slow starting applications time to initialise before liveness checks begin. It prevents a valid slow start from being treated as a dead process.</p>\n<p>Mixing these meanings causes outages. A liveness check that depends on a slow database can restart every instance during a database incident. A readiness check that always returns success can send traffic to a process that cannot serve it.</p>\n<h2>Design readiness around serving traffic</h2>\n<p>Readiness should reflect whether the instance can accept the traffic it is about to receive.</p>\n<p>A useful readiness check may include:</p>\n<ul>\n<li>required configuration loaded</li>\n<li>critical local resources initialised</li>\n<li>worker pool ready</li>\n<li>queue consumer ready when the instance is a consumer</li>\n<li>dependency state when the dependency is required for every request</li>\n<li>overload state when the instance is intentionally shedding traffic</li>\n</ul>\n<p>Keep readiness fast and bounded. It should not perform expensive deep checks on every probe. If a dependency is optional or has a working fallback, do not fail readiness only because that dependency is unavailable.</p>\n<h2>Keep liveness narrow</h2>\n<p>Liveness should detect a process that cannot recover without restart. Examples include a deadlocked event loop, a failed main worker, or an internal state that prevents all future work.</p>\n<p>Do not use liveness as a general dependency test. Restarting healthy application processes will not repair a failed database, a broken network route, or an upstream outage. It can make the incident worse by adding restart storms and cold starts.</p>\n<p>When in doubt, prefer readiness failure over liveness failure. Removing an instance from traffic is usually safer than restarting it.</p>\n<h2>Use startup checks for slow initialisation</h2>\n<p>Some services need time to load caches, run migrations, warm interpreters, create connections, or build local indexes. A startup check lets the platform distinguish slow initialisation from a failed process.</p>\n<p>After startup succeeds, liveness can begin. This is safer than setting a large liveness delay that hides real failures later in the process lifetime.</p>\n<h2>Graceful shutdown starts before termination</h2>\n<p>Graceful shutdown is a sequence, not a signal handler alone. The service should stop accepting new work, let existing work finish within a bounded time, release resources, flush telemetry, and exit.</p>\n<p>A common sequence is:</p>\n<ul>\n<li>receive the termination signal</li>\n<li>mark readiness as false</li>\n<li>stop accepting new requests or messages</li>\n<li>drain in-flight work up to a deadline</li>\n<li>close listeners and consumers</li>\n<li>flush logs, metrics, and traces</li>\n<li>close database and network connections</li>\n<li>exit with a clear status</li>\n</ul>\n<p>The deadline matters. A process that never exits will eventually be killed by the platform.</p>\n<h2>Account for load balancers and endpoints</h2>\n<p>Traffic may continue briefly after readiness changes. Endpoint updates, proxies, clients, and load balancers do not all react instantly.</p>\n<p>Design shutdown to tolerate that delay. Stop advertising readiness before closing the listener. Keep serving existing connections during the drain period. Return a clear failure for new work only after the instance has been removed from normal routing, or when the shutdown deadline requires it.</p>\n<h2>Handle background workers deliberately</h2>\n<p>Workers need their own shutdown path. On termination, a worker should stop taking new jobs, finish or safely abandon the current job, and make the job visible for retry when required.</p>\n<p>The correct behaviour depends on the job system. Some jobs are idempotent and can be retried safely. Others need explicit leases, checkpoints, or compensation. The shutdown path should match the delivery and retry semantics of the queue.</p>\n<h2>Test shutdown during deploys</h2>\n<p>A graceful shutdown path that is not tested will fail when deploys or node maintenance happen at load.</p>\n<p>Test these cases:</p>\n<ul>\n<li>termination while serving a long request</li>\n<li>termination while holding a queue job</li>\n<li>termination during dependency slowness</li>\n<li>termination while telemetry export is slow</li>\n<li>repeated rolling deploys under normal traffic</li>\n<li>readiness failure without process restart</li>\n</ul>\n<p>Measure dropped requests, duplicated work, shutdown duration, and time to remove the instance from traffic.</p>\n<h2>Common mistakes</h2>\n<p>Do not make every probe call the database. That turns a database incident into a platform restart incident.</p>\n<p>Do not return ready before the service can serve real traffic. That creates errors during deploys and scaling.</p>\n<p>Do not ignore termination signals. Default process behaviour may exit immediately without draining work.</p>\n<p>Do not make shutdown unbounded. Platforms eventually send a hard kill.</p>\n<p>Do not assume one health endpoint is enough. Separate readiness, liveness, and startup semantics.</p>\n<h2>Conclusion</h2>\n<p>Health checks should tell the platform whether to route traffic, wait for startup, or restart a broken process. Graceful shutdown should drain work before exit. Keep the checks narrow, fast, and truthful, then test them under the same conditions that happen during deploys and failures.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "reliability",
        "devops",
        "architecture"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/how-http-requests-actually-work",
      "url": "https://soulstack.co.uk/blog/how-http-requests-actually-work",
      "title": "How HTTP requests actually work",
      "summary": "An HTTP request is the unit of work behind a page load, an API call and most browser driven application behaviour. It looks simple from application code, but it crosses DNS, trans…",
      "content_html": "<p>An HTTP request is the unit of work behind a page load, an API call and most browser driven application behaviour. It looks simple from application code, but it crosses DNS, transport, TLS, HTTP semantics, caches, redirects, content negotiation and server routing before a response reaches the caller. This post walks the whole chain end to end, and points to deeper posts on the layers that deserve their own treatment.</p>\n<h2>The request starts with a URL</h2>\n<p>A browser or HTTP client starts with a URL. The URL identifies the scheme, host, optional port, path, query string and fragment. The fragment is not sent to the server because it is client side state.</p>\n<p>For an HTTPS URL, the client needs a network route to the host and a secure connection before it can send the HTTP message. For an HTTP URL, the request is sent without TLS protection and can be observed or changed by parties on the network.</p>\n<h2>DNS resolves the host name</h2>\n<p>The host name must be resolved to an address before the client can connect. The client normally asks a recursive resolver, which may answer from cache or query authoritative DNS servers. The answer can include IPv4 addresses, IPv6 addresses or aliases that lead to other records.</p>\n<p>DNS answers have a time to live. The TTL controls how long resolvers are expected to cache the answer. A low TTL can make planned changes reach users sooner, but it does not force every resolver to discard old answers instantly. For how resolution actually behaves in practice, see dns-for-developers.</p>\n<h2>The client opens a transport connection</h2>\n<p>Most HTTP traffic uses TCP or QUIC as the transport. HTTP/1.1 and HTTP/2 commonly run over TCP. HTTP/3 runs over QUIC, which uses UDP.</p>\n<p>With TCP, the client opens a connection to the server address and port. For HTTPS, port 443 is the conventional default. For plain HTTP, port 80 is the conventional default. A URL can override the port.</p>\n<h2>TLS protects HTTPS</h2>\n<p>For HTTPS over TCP, the TLS handshake happens before the HTTP request is sent. The server presents a certificate chain for the requested host name. The client validates that the certificate is trusted, unexpired, not rejected by policy and valid for the host name.</p>\n<p>The handshake also negotiates cryptographic parameters and establishes keys, after which HTTP bytes are protected for confidentiality and integrity on the network path. For how certificates and chains are validated, see understanding-tls-and-certificates.</p>\n<h2>The HTTP message has method, target, fields and content</h2>\n<p>An HTTP request carries a method, a request target, header fields and optional content. The method says what the client wants to do. Common methods include GET, HEAD, POST, PUT, PATCH and DELETE.</p>\n<p>Header fields carry metadata. Examples include Accept, Content-Type, Authorization, Cookie, If-None-Match and Cache-Control. A request can also carry content, for example a JSON document in a POST request or form data from a browser submission.</p>\n<p>HTTP semantics are shared across protocol versions. HTTP/1.1, HTTP/2 and HTTP/3 encode messages differently on the wire, but the meaning of methods, status codes, header fields and content is defined at the semantic layer.</p>\n<h2>The server selects a handler</h2>\n<p>The server receives the request and maps it to application behaviour. That mapping can use the host, path, method, headers, authenticated identity and content. A reverse proxy, load balancer, CDN or application gateway may process the request before the application sees it.</p>\n<p>The application should not treat the method as decoration. A GET request should be safe to make without changing server state. A POST request is normally used when the request asks the server to process submitted content or create a subordinate resource.</p>\n<h2>The response tells the client what happened</h2>\n<p>The response has a status code, header fields and optional content. Status codes are grouped by class. A 2xx status means success, a 3xx status means redirection, a 4xx status means a client side problem and a 5xx status means a server side problem.</p>\n<p>The response headers describe how the client should handle the content. Content-Type identifies the media type. Cache-Control controls caching behaviour. Set-Cookie asks the browser to store a cookie. Location is used with redirects and some creation responses.</p>\n<h2>Redirects can trigger more requests</h2>\n<p>A 3xx response can point the client at another URL. The client may then issue a follow up request. Redirect handling is part of the total request cost because each redirect can require extra DNS, connection and TLS work unless an existing connection can be reused.</p>\n<p>Use redirects deliberately. They are useful for canonical URLs and migrations, but chains of redirects add latency and make failure modes harder to diagnose.</p>\n<h2>Caches can answer without the origin</h2>\n<p>A request may be satisfied by a browser cache, service worker cache, proxy cache or CDN cache. A cache can reuse a stored response when it is still fresh, or when validation with the origin confirms it is still current.</p>\n<p>Validation commonly uses ETag with If-None-Match or Last-Modified with If-Modified-Since. If the origin returns 304 Not Modified, the client reuses the stored response body and applies updated response metadata. For freshness rules and cache layering, see a-practical-guide-to-caching-on-the-web.</p>\n<h2>Connection reuse matters</h2>\n<p>Modern clients avoid opening a new connection for every request when they can. HTTP/1.1 can reuse a TCP connection for multiple requests. HTTP/2 and HTTP/3 can multiplex multiple streams over one connection.</p>\n<p>Connection reuse reduces repeated DNS, transport and TLS cost. It also means one page view can share infrastructure work across many assets and API calls.</p>\n<h2>How to debug a request</h2>\n<p>Start by separating the layers. Check DNS resolution first. Then check whether a connection can be opened. Then check TLS certificate validation. Then inspect the HTTP request and response, including method, URL, status code, headers and body.</p>\n<p>Browser developer tools are enough for many frontend issues. For service, proxy and API issues, compare the browser view with server logs and a command line client. Differences often come from cookies, credentials, headers, redirects, content negotiation or cache state.</p>\n<h2>Conclusion</h2>\n<p>An HTTP request is not just a line of application code. It is a chain of resolution, connection setup, optional TLS negotiation, HTTP semantics, server routing, cache decisions and response handling. Debugging gets easier when each layer is checked separately and the request is treated as a complete exchange rather than a single function call.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "web",
        "api",
        "performance"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/how-to-build-deployment-confidence-without-slowing-teams-down",
      "url": "https://soulstack.co.uk/blog/how-to-build-deployment-confidence-without-slowing-teams-down",
      "title": "How to build deployment confidence without slowing teams down",
      "summary": "Deployment confidence does not come from slowing every release until nobody is nervous. It comes from making small changes, moving them through predictable gates, and having relia…",
      "content_html": "<p>Deployment confidence does not come from slowing every release until nobody is nervous. It comes from making small changes, moving them through predictable gates, and having reliable ways to detect and reverse damage.</p>\n<h2>Confidence is built before deployment</h2>\n<p>A deployment should not be the first serious test of a change. Build reproducibility, automated tests, static analysis, dependency checks and environment promotion all create evidence before production.</p>\n<p>That evidence should be attached to the artefact being deployed. Rebuilding different artefacts in each environment weakens traceability. A better pattern is to build once, verify once, then promote the same artefact with environment specific configuration.</p>\n<h2>Smaller changes are safer changes</h2>\n<p>Large releases are harder to review, harder to test and harder to roll back. They also make incident diagnosis slower because many behaviours change at once.</p>\n<p>Teams should optimise for frequent, small deployments. That does not mean every change must be user visible. Feature flags, dark launches and staged enablement let teams deploy code separately from releasing capability.</p>\n<h2>Progressive delivery reduces blast radius</h2>\n<p>Blue green and canary strategies reduce risk by controlling traffic movement. Blue green keeps two production capable environments and shifts traffic between them, leaving the previous environment ready as a rollback target. Canary releases expose a new version to a limited share of real traffic before wider rollout.</p>\n<p>The important part is not the label. The important part is measured exposure. A rollout should have health checks, service level indicators, clear promotion rules and an automated or well rehearsed rollback path.</p>\n<h2>Rollback is a product feature</h2>\n<p>A team that cannot roll back quickly cannot deploy confidently. Rollback should be designed and tested, not improvised during an incident.</p>\n<p>Database changes are often the hard part. Backward compatible schema changes, expand and contract migrations, idempotent jobs and version tolerant clients matter more than the deployment tool. The expand and contract approach keeps old and new structures working together while application code moves across, so most steps can be reversed if something goes wrong. The release process should assume that application and data changes may need different recovery strategies.</p>\n<h2>Do not turn gates into queues</h2>\n<p>Manual approvals can be useful for high risk changes, but they are a poor substitute for evidence. A queue of approvals often moves responsibility away from the people who understand the change and towards people who can only check process.</p>\n<p>Good gates are automatic where possible, risk based where necessary and explicit about what they prove. A low risk service change should not wait for the same ceremony as a privileged identity change or an irreversible data migration.</p>\n<h2>Measure the release system</h2>\n<p>Deployment confidence should be measured. Useful signals include deployment frequency, lead time for changes, change failure rate and the time taken to restore service after a failed deployment, alongside rollback success and the number of emergency fixes after release.</p>\n<p>The goal is not to make every metric look good. It is to find where the release system creates delay or hides risk.</p>\n<h2>Conclusion</h2>\n<p>Fast delivery and safe delivery are not opposites. Teams slow down when they lack evidence, small batches, progressive rollout and recovery practice. Build those capabilities into the platform and deployment becomes a routine engineering act, not a scheduled act of hope.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "devops",
        "reliability",
        "architecture"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/how-to-review-a-pull-request-well",
      "url": "https://soulstack.co.uk/blog/how-to-review-a-pull-request-well",
      "title": "How to review a pull request well",
      "summary": "A good pull request review protects the codebase without turning review into a personal argument or a slow approval queue. The reviewer checks correctness, maintainability, risk,…",
      "content_html": "<p>A good pull request review protects the codebase without turning review into a personal argument or a slow approval queue. The reviewer checks correctness, maintainability, risk, and clarity, then gives feedback the author can act on.</p>\n<h2>Start with the purpose</h2>\n<p>Read the pull request title and description before reading the diff. Identify what the change claims to do, why it exists, and how it was tested. If the description does not answer those questions, ask for the missing context before guessing.</p>\n<p>A useful description usually includes:</p>\n<ul>\n<li>the problem being solved</li>\n<li>the user visible or operator visible behaviour change</li>\n<li>the main implementation approach</li>\n<li>test evidence</li>\n<li>rollout or migration notes when needed</li>\n<li>known limitations</li>\n</ul>\n<p>The description does not need to be long. It needs to make the review possible.</p>\n<h2>Check the shape of the change</h2>\n<p>Before reviewing line by line, scan the file list. Look for signals that the pull request may be too broad:</p>\n<ul>\n<li>unrelated formatting mixed with behaviour changes</li>\n<li>generated files mixed with source edits</li>\n<li>large renames mixed with logic changes</li>\n<li>changes across several domains without a clear reason</li>\n<li>tests missing for changed behaviour</li>\n</ul>\n<p>When the shape is wrong, ask whether the change can be split. Smaller pull requests are easier to review and safer to merge.</p>\n<h2>Review one file at a time</h2>\n<p>A systematic review reduces missed details. Work through the changed files one by one. On GitHub you can mark a file as viewed to collapse it and track progress, and it is unmarked automatically if the file changes again. Leave comments on the smallest useful diff location so the author can see exactly what you mean.</p>\n<p>For each file, ask:</p>\n<ul>\n<li>does this change match the stated purpose?</li>\n<li>is the behaviour correct for normal and edge cases?</li>\n<li>is error handling explicit enough?</li>\n<li>is the code understandable without private context?</li>\n<li>does the test coverage match the risk?</li>\n<li>does the change introduce avoidable coupling?</li>\n<li>are names accurate and consistent?</li>\n</ul>\n<p>Do not spend most of the review on preferences that are already covered by formatting tools. Let automated checks handle mechanical rules where possible.</p>\n<h2>Test the reasoning, not only the syntax</h2>\n<p>Passing checks are necessary, but they are not a full review. Automated tests can miss the wrong requirement, a missing migration, an unsafe default, or a confusing operational behaviour.</p>\n<p>Look for questions such as:</p>\n<ul>\n<li>what happens when the input is empty, malformed, slow, or duplicated?</li>\n<li>what happens when a dependency fails?</li>\n<li>does the change preserve backwards compatibility where it must?</li>\n<li>does it need a migration, feature flag, or staged rollout?</li>\n<li>can the change be observed in logs, metrics, or errors?</li>\n<li>does the failure mode protect data and security boundaries?</li>\n</ul>\n<p>For a risky change, ask the author to add test evidence or explain why a test is not practical. Do not approve because the diff looks plausible.</p>\n<h2>Keep comments actionable</h2>\n<p>A review comment should make clear whether it is blocking, optional, or a question.</p>\n<p>Use direct language:</p>\n<ul>\n<li>&quot;Blocking: this returns success when the write fails. Please propagate the error.&quot;</li>\n<li>&quot;Question: should this also handle archived records?&quot;</li>\n<li>&quot;Suggestion: this helper name could describe the unit it returns.&quot;</li>\n</ul>\n<p>Avoid vague comments such as &quot;not sure about this&quot; or &quot;can we make this better&quot;. They force the author to guess what standard they are being held to.</p>\n<h2>Separate correctness from preference</h2>\n<p>Not every improvement should block a merge. Blocking comments should be reserved for issues that affect correctness, security, maintainability, operability, or agreed project standards.</p>\n<p>Preferences can still be useful, but label them as suggestions. A reviewer who blocks on personal style trains authors to optimise for the reviewer rather than the codebase.</p>\n<p>If the project has a style rule, point to the rule or encode it in tooling. If it is not a rule, treat it as a suggestion.</p>\n<h2>Review tests as carefully as production code</h2>\n<p>Tests can hide weak assumptions. Check that tests fail for the right reason before the implementation is fixed, cover the behaviour that changed, and avoid overfitting to implementation details.</p>\n<p>Good tests usually show:</p>\n<ul>\n<li>the important input or state</li>\n<li>the action being tested</li>\n<li>the expected result</li>\n<li>the relevant edge case</li>\n</ul>\n<p>Be cautious with snapshot updates, broad mocks, and tests that assert private implementation details. They can create confidence without protecting behaviour.</p>\n<h2>Check security and data boundaries</h2>\n<p>Every review should include a short security pass. The depth depends on the change, but the reviewer should consider:</p>\n<ul>\n<li>input validation</li>\n<li>authorisation checks</li>\n<li>secret handling</li>\n<li>logging of sensitive data</li>\n<li>dependency trust</li>\n<li>permissions for new jobs or workflows</li>\n<li>database migrations and destructive operations</li>\n</ul>\n<p>For high risk areas, require a subject matter reviewer or code owner. General approval is not a substitute for domain expertise.</p>\n<h2>Submit a clear review state</h2>\n<p>When you finish, submit a clear review state. GitHub offers three: approve submits your feedback and approves merging the change, request changes submits feedback that must be addressed before the pull request can be merged, and comment leaves general feedback without approving or blocking.</p>\n<p>Approve when the change is ready to merge under the repository rules. Request changes when there is at least one blocking issue. Comment without approval when the review is incomplete or the questions are non-blocking. Note that request changes is informational unless a branch protection rule or ruleset enforces required reviews.</p>\n<p>A useful summary says what you checked:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#E6EDF3\">Reviewed the API behaviour, migration path, and tests. The error path for duplicate requests still needs a fix before merge.</span></span></code></pre></div><p>This helps the author and later readers understand the scope of the review.</p>\n<h2>Respond well as an author</h2>\n<p>A good review process needs good author behaviour too. Reply to each comment, explain decisions, and push follow up commits that are easy to inspect. If you disagree, explain the trade off with evidence.</p>\n<p>Do not mark a conversation resolved until the issue has been fixed or the reviewer agrees it does not need a change.</p>\n<h2>Conclusion</h2>\n<p>A strong pull request review is structured, evidence based, and respectful. Start with the purpose, inspect the shape, review the diff systematically, ask for tests where risk requires them, and make comments actionable. The goal is not to win the review. The goal is to merge a correct, maintainable change with a useful record of the decision.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "git",
        "devops"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/idempotency-and-retries-in-distributed-systems",
      "url": "https://soulstack.co.uk/blog/idempotency-and-retries-in-distributed-systems",
      "title": "Idempotency and retries in distributed systems",
      "summary": "Retries are necessary in distributed systems because networks, processes, and dependencies fail in partial and ambiguous ways. The hard part is not deciding whether to retry. The…",
      "content_html": "<p>Retries are necessary in distributed systems because networks, processes, and dependencies fail in partial and ambiguous ways. The hard part is not deciding whether to retry. The hard part is making a retry safe when the caller does not know whether the previous attempt failed before execution, during execution, after execution, or only while returning the response. The answer is to design for repeated intent, with idempotency on the server and disciplined retries on the client.</p>\n<h2>Retries solve transient failure</h2>\n<p>A transient failure is a temporary condition such as a timeout, a dropped connection, a throttled dependency, or a service that is briefly unavailable. Retrying can turn that failure into success without user involvement.</p>\n<p>Retries are harmful when they are unbounded, immediate, or applied to operations that are not safe to repeat. They can amplify load during an outage, keep a failing dependency overloaded, and create duplicate side effects. Retry policy is therefore part of the API contract and the capacity model, not just client convenience.</p>\n<h2>Timeouts create ambiguity</h2>\n<p>A timeout does not prove that the operation failed. It only proves that the caller did not receive a response in time. The service might still be processing the request, the response might have been lost, or the operation might have completed successfully.</p>\n<p>This ambiguity is why idempotency belongs on the server side. A client cannot make a non-idempotent server operation safe by retrying carefully. The server has to recognise repeated intent and produce a stable outcome.</p>\n<h2>Idempotency makes repetition safe</h2>\n<p>An operation is idempotent when making the same request more than once has the same intended effect as making it once. HTTP already defines this for some methods. Under RFC 9110, GET, HEAD, OPTIONS, and TRACE are safe, and PUT and DELETE are idempotent even though they change state. POST is neither safe nor idempotent by default.</p>\n<p>Many business operations use POST, and that is where duplicate side effects appear. Creating a payment, submitting an order, sending an invite, provisioning infrastructure, or starting a job can all repeat when retried. For these operations, the usual design is to require an idempotency key, sometimes called a client request token. The server records the key, the request identity, and the result of the first accepted operation. A later request with the same key returns the recorded result instead of executing the side effect again.</p>\n<p>Require keys where duplicate execution would be harmful or expensive. Do not require them for simple reads, and do not use them to hide non-deterministic behaviour in endpoints that should have been modelled with PUT or a stable resource URI.</p>\n<h2>Define the key contract</h2>\n<p>The client generates a unique key for one logical operation and sends the same key on every retry of that operation. A version 4 UUID is a practical format because it is widely supported and needs no central coordination. A random key generated by the caller is often safer than deriving the key from request parameters, because identical parameters do not always mean the same business operation.</p>\n<p>The key needs a scope. Scoping it to the authenticated account, endpoint, method, and a hash of the request body prevents two clients from colliding by accident and prevents a key used for one operation being replayed against a different one. The server must detect when the same key is reused with a different intent and reject it.</p>\n<p>Document the contract so clients can rely on it. State the maximum key length, the allowed characters, whether keys are case sensitive, the retention period, and what happens when the same key arrives with a different request body. As a real reference point, Stripe carries the key in the Idempotency-Key request header, accepts keys up to 255 characters, recommends UUIDs, and prunes keys after they are at least 24 hours old.</p>\n<h2>Store the result, not just the fact</h2>\n<p>A robust implementation records the in-flight or completed result for a key. When the first request completes, later retries with the same key return the same outcome, or an equivalent representation of it.</p>\n<p>Storing only that a key was seen is not enough. If the server marks a key as seen before completing the operation, every retry can be blocked while the side effect never happened. If it marks the key after the side effect but before saving the result, the client can still get an unclear response. The record should hold a canonical representation or digest of the relevant request fields, the operation state, the final response, and an expiry time.</p>\n<p>Use a transactional boundary where possible: reserve the key, perform the side effect, and persist the result so it can be replayed safely. The token reservation must be atomic with the decision to proceed. Otherwise two concurrent attempts can both pass the check and duplicate the side effect.</p>\n<h2>Handle concurrent retries</h2>\n<p>Clients, load balancers, and SDKs can retry concurrently. Two requests with the same key may reach different server instances at the same time, so the idempotency store must enforce uniqueness atomically.</p>\n<p>When one request is already in progress for a key, return a clear conflict or retryable response. Do not run the operation twice and hope downstream systems deduplicate it.</p>\n<p>For long running operations, return an operation resource or status handle. A retried create request can then return the same operation reference while work continues. That is clearer than pretending a long operation is always synchronous.</p>\n<h2>Choose retry rules carefully</h2>\n<p>A retry policy should define which errors are retryable, how many attempts are allowed, how long the client waits between attempts, and when the caller gives up. Common retryable cases include timeouts, connection resets, temporary unavailability, and explicit throttling such as 429 and 503. Validation errors, authentication failures, and permanent business rule failures should not be retried without a change.</p>\n<p>Use exponential backoff with jitter for shared dependencies. Backoff reduces pressure. Jitter spreads attempts so large numbers of clients do not retry in lockstep, which is the pattern the Amazon Builders&#39; Library recommends to avoid synchronised retry storms. Add a total deadline so the retry loop cannot outlive the user&#39;s request, the queue visibility timeout, or the business operation window. Clients should also respect Retry-After when the protocol provides it.</p>\n<p>Servers should make retry decisions easier. Return 429 for rate limiting, 503 for temporary overload or maintenance, and a clear error body that says whether the request can be retried.</p>\n<h2>Avoid retry storms</h2>\n<p>When many callers share a dependency, retry traffic becomes part of that dependency&#39;s load. Retries need budgets, concurrency limits, and observability so operators can see when retry behaviour is making an incident worse. A policy that looks fine in unit tests can still fail under load if it multiplies traffic during an outage.</p>\n<h2>Background jobs</h2>\n<p>Jobs should be designed as if they can run more than once. A worker can crash after performing a side effect but before acknowledging the message. A broker can redeliver. A deployment can stop a worker mid task. The right response is not to hope for exactly-once execution. It is to make the job idempotent and transactional where possible.</p>\n<p>Use business keys and database constraints for externally visible effects. For example, record that a notification for a specific event and recipient has already been sent, or that a payment capture has already been submitted with a specific operation key.</p>\n<h2>Idempotency is not exactly once</h2>\n<p>Idempotency keys reduce duplicate effects at the API boundary. They do not guarantee exactly-once execution across every downstream system. Message brokers, webhooks, email providers, and payment processors can still retry or duplicate work.</p>\n<p>Downstream consumers should deduplicate using stable event IDs, job IDs, resource IDs, or provider IDs. Idempotency should be layered, not assumed to exist in one place. If the operation is high value, consider returning the created resource URI so clients can recover by lookup after a key expires.</p>\n<h2>What to measure</h2>\n<p>Measure retry counts, retry success rates, timeout rates, throttling responses, duplicate key reuse, mismatched key reuse, and dependency saturation. Log the idempotency key or a safe correlation identifier so you can trace repeated intent. Do not log secrets, payment data, or raw personal data.</p>\n<h2>Conclusion</h2>\n<p>Retries are only safe when repeated intent is part of the design. Use idempotency keys for side-effecting operations such as POST, with a clear key contract, atomic storage, request matching, result replay, and a sensible retention period. Use bounded retries with backoff and jitter for transient faults, and observability that shows when retries help or harm. In distributed systems, duplicate attempts are normal. Design the operation so duplicates do not become duplicate business effects.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "architecture",
        "api",
        "reliability"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/logging-that-is-actually-useful",
      "url": "https://soulstack.co.uk/blog/logging-that-is-actually-useful",
      "title": "Logging that is actually useful",
      "summary": "Useful logging is not about writing more lines. It is about recording the events that help an engineer understand what happened, who or what was affected, and what to do next. Thi…",
      "content_html": "<p>Useful logging is not about writing more lines. It is about recording the events that help an engineer understand what happened, who or what was affected, and what to do next. This post is about the craft of a good log event: how to structure it, which level to give it, what to record, and what to keep out.</p>\n<h2>Start with the question the log must answer</h2>\n<p>A useful log entry answers an operational question. It should help someone diagnose a fault, investigate a security event, explain a state transition, or prove that an expected action happened.</p>\n<p>Before adding a log line, decide which question it answers. Good examples include:</p>\n<ul>\n<li>Did the request enter the system?</li>\n<li>Which dependency failed?</li>\n<li>Was the operation retried?</li>\n<li>Which customer visible resource was affected?</li>\n<li>Was access denied, and why?</li>\n</ul>\n<p>Bad logs only restate that code executed. A message such as &quot;starting process&quot; is rarely useful unless it marks a lifecycle transition that operators act on.</p>\n<h2>Prefer structured events</h2>\n<p>Plain text is easy to write but hard to query consistently. Structured logs give every important field a stable name, which makes them easier to filter, aggregate, alert on, and join with other signals.</p>\n<p>Use consistent fields across services. At minimum, most application events need:</p>\n<ul>\n<li>timestamp</li>\n<li>level</li>\n<li>service name</li>\n<li>environment</li>\n<li>event name</li>\n<li>request or trace identifier</li>\n<li>operation name</li>\n<li>outcome</li>\n<li>duration when the event represents completed work</li>\n<li>error type when the event represents failure</li>\n</ul>\n<p>Treat the event name as an API. Keep it stable, specific, and low cardinality. Put variable detail in fields, not in the event name.</p>\n<h2>Log outcomes, not every step</h2>\n<p>Log the boundary of meaningful work. For example, log that a request completed, that a payment authorisation failed, or that a background job was skipped because its lock was held elsewhere.</p>\n<p>Avoid logging every internal branch. High volume debug logs hide the events that matter, increase cost, and make incident review slower. Keep debug logs temporary or disabled by default in production.</p>\n<p>A useful production log stream should let an engineer move from symptom to cause without reading a transcript of the whole program.</p>\n<h2>Use levels consistently</h2>\n<p>Levels only help when everyone uses them the same way.</p>\n<p>Use error when the operation failed and needs investigation or user visible handling. Use warn when the operation completed with degraded behaviour or hit an unusual condition that could become a fault. Use info for normal lifecycle and business significant events. Use debug for development detail that is not normally retained in production.</p>\n<p>Do not log an error and then handle it completely with no degraded outcome. That creates false alarms. Do not hide failed user visible work at info level. That makes real incidents harder to find.</p>\n<h2>Include enough context to act</h2>\n<p>A log entry should stand alone. During an incident, the reader may see one event in a search result or alert payload, with no surrounding lines for context.</p>\n<p>Include identifiers that let the reader find the affected request, job, account, resource, tenant, host, deployment, or upstream dependency. Include the decision the service made. Include the observed status code or error class when a dependency failed.</p>\n<p>Do not include secrets, credentials, access tokens, session identifiers, personal data, or full request bodies unless there is a documented and approved reason. Logs are copied, indexed, retained, exported, and read by more systems than your application data ever is.</p>\n<h2>Make security events explicit</h2>\n<p>Security relevant events should be easy to find without guessing text fragments. OWASP recommends recording authentication successes and failures, authorisation failures, input and output validation failures, session management failures, user administration actions such as privilege changes, and access to sensitive data.</p>\n<p>Keep these events structured and consistent. Use a stable event name, an outcome, an actor identifier, a target identifier, a source address where appropriate, and a reason code. Never log authentication passwords, access tokens, session identifiers, encryption keys, or connection strings, even when authentication fails. OWASP lists all of these as data to exclude from logs.</p>\n<p>Security logs are not only for attack detection. They are also evidence for audit, investigation, and control validation.</p>\n<h2>Correlate with other signals</h2>\n<p>A log becomes more useful when it can be connected to a trace, request, or metric time series. Propagate a trace or request identifier through the call path and write it to every event created while handling the request.</p>\n<p>Logs, metrics, and traces answer different questions and none replaces the others. The relationship between them is its own topic, covered in the post on metrics, logs, and traces. For an individual event, the practical step is to carry a correlation identifier so the event can be tied back to the wider picture.</p>\n<h2>Keep retention and cost visible</h2>\n<p>Logging has operational cost. High cardinality fields, verbose payloads, and noisy event streams make storage and query systems expensive. They can also slow down incident response by returning too much irrelevant data.</p>\n<p>Set retention by use case. Short lived debug detail, operational logs, security audit events, and compliance records often need different retention periods. Make those periods explicit and review them when the service changes.</p>\n<h2>A practical review checklist</h2>\n<p>For each production log event, ask:</p>\n<ul>\n<li>What operational question does this answer?</li>\n<li>Can it be searched by stable fields?</li>\n<li>Does it include the identifiers needed to act?</li>\n<li>Is the level correct?</li>\n<li>Could it leak secrets or sensitive data?</li>\n<li>Is the event volume acceptable during failure?</li>\n<li>Can it be correlated with a trace, request, or metric?</li>\n</ul>\n<p>If the answer is unclear, change the log or remove it.</p>\n<h2>Conclusion</h2>\n<p>Useful logging is designed, not sprinkled through code after something fails. Record structured events at operational boundaries, protect sensitive data, keep levels consistent, and make every event answer a question an engineer will actually ask in production.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "reliability",
        "security",
        "devops"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/managing-dotfiles-across-machines",
      "url": "https://soulstack.co.uk/blog/managing-dotfiles-across-machines",
      "title": "Managing dotfiles across machines",
      "summary": "Dotfiles are personal configuration files, usually stored under your home directory or ~/.config. Managing them across machines is a configuration management problem: keep useful…",
      "content_html": "<p>Dotfiles are personal configuration files, usually stored under your home directory or <code>~/.config</code>. Managing them across machines is a configuration management problem: keep useful settings reproducible, keep secrets out of the repository, and make machine-specific differences explicit.</p>\n<h2>Decide what belongs in dotfiles</h2>\n<p>Track configuration you would want to rebuild on a new machine: shell settings, editor configuration, Git configuration, terminal settings and tool defaults.</p>\n<p>Do not track private keys, access tokens, browser profiles, caches, build outputs or generated files. Treat public dotfiles repositories as public documentation. Anything committed there should be safe to publish.</p>\n<p>Separate personal preferences from project policy. Repository-level files such as <code>.editorconfig</code>, formatter settings and CI scripts belong with the project, not in personal dotfiles.</p>\n<h2>Use a repository as the source of truth</h2>\n<p>A Git repository gives you history, review and rollback. Keep the layout predictable. One common pattern is to mirror the target home directory structure inside the repository.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span>dotfiles/</span></span>\n<span class=\"line\"><span>  git/</span></span>\n<span class=\"line\"><span>    .gitconfig</span></span>\n<span class=\"line\"><span>  shell/</span></span>\n<span class=\"line\"><span>    .bashrc</span></span>\n<span class=\"line\"><span>  editor/</span></span>\n<span class=\"line\"><span>    .config/editor/config</span></span></code></pre></div><p>This layout works well with symlink managers such as GNU Stow, because each package can contain files laid out as they should appear under the target directory.</p>\n<h2>Use GNU Stow for simple symlink management</h2>\n<p>GNU Stow is a symlink farm manager. It takes separate package directories and makes them appear under a target directory by creating symbolic links.</p>\n<p>With a repository like this:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span>dotfiles/</span></span>\n<span class=\"line\"><span>  git/</span></span>\n<span class=\"line\"><span>    .gitconfig</span></span>\n<span class=\"line\"><span>  shell/</span></span>\n<span class=\"line\"><span>    .bashrc</span></span></code></pre></div><p>Run Stow from the repository and target the home directory.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#79C0FF\">cd</span><span style=\"color:#A5D6FF\"> ~/dotfiles</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">stow</span><span style=\"color:#79C0FF\"> --target=</span><span style=\"color:#A5D6FF\">\"</span><span style=\"color:#E6EDF3\">$HOME</span><span style=\"color:#A5D6FF\">\"</span><span style=\"color:#A5D6FF\"> git</span><span style=\"color:#A5D6FF\"> shell</span></span></code></pre></div><p>The files in <code>~/dotfiles/git</code> and <code>~/dotfiles/shell</code> then appear in the home directory through symbolic links.</p>\n<p>Stow is a good fit when machines can share mostly the same files and you want minimal tooling. Its main constraint is that the repository layout and the target layout are tightly connected.</p>\n<h2>Use chezmoi when machines differ</h2>\n<p>chezmoi manages dotfiles by storing the desired state and applying it to each machine. It offers features beyond symlinking, including templates and machine-specific configuration.</p>\n<p>A typical workflow starts by adding an existing file, editing the source state, then applying it.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">chezmoi</span><span style=\"color:#A5D6FF\"> add</span><span style=\"color:#A5D6FF\"> ~/.gitconfig</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">chezmoi</span><span style=\"color:#A5D6FF\"> edit</span><span style=\"color:#A5D6FF\"> ~/.gitconfig</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">chezmoi</span><span style=\"color:#A5D6FF\"> apply</span></span></code></pre></div><p>chezmoi is a good fit when laptops, servers and workstations need different values in the same logical file. Templates can express those differences without maintaining entirely separate copies.</p>\n<p>Use templates sparingly. A simple copied file is easier to inspect than a template with many branches.</p>\n<h2>Keep secrets out of plain text</h2>\n<p>Never commit private SSH keys, API tokens or production credentials. Use a password manager, secret manager or an encrypted workflow designed for secrets.</p>\n<p>For dotfiles, prefer references to secret material rather than the secret itself. For example, configure a tool to read a token from an environment variable or credential helper.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FF7B72\">export</span><span style=\"color:#E6EDF3\"> TOOL_CONFIG</span><span style=\"color:#FF7B72\">=</span><span style=\"color:#A5D6FF\">\"</span><span style=\"color:#E6EDF3\">$HOME</span><span style=\"color:#A5D6FF\">/.config/tool/config\"</span></span></code></pre></div><p>Audit before pushing. Use Git status, Git diff and secret scanning where available.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> status</span><span style=\"color:#79C0FF\"> --short</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> diff</span><span style=\"color:#79C0FF\"> --cached</span></span></code></pre></div><h2>Handle machine-specific differences explicitly</h2>\n<p>Do not rely on accidental hostnames or manual edits that are never committed. Record differences in a clear place.</p>\n<p>Common patterns include:</p>\n<ul>\n<li>separate packages, such as <code>work</code>, <code>personal</code> and <code>server</code></li>\n<li>templates that branch on operating system or hostname</li>\n<li>small local include files that are intentionally ignored by Git</li>\n</ul>\n<p>For shell configuration, keep local overrides isolated.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#8B949E\"># In .bashrc</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">if</span><span style=\"color:#E6EDF3\"> [ </span><span style=\"color:#FF7B72\">-f</span><span style=\"color:#A5D6FF\"> \"</span><span style=\"color:#E6EDF3\">$HOME</span><span style=\"color:#A5D6FF\">/.bashrc.local\"</span><span style=\"color:#E6EDF3\"> ]; </span><span style=\"color:#FF7B72\">then</span></span>\n<span class=\"line\"><span style=\"color:#79C0FF\">  .</span><span style=\"color:#A5D6FF\"> \"</span><span style=\"color:#E6EDF3\">$HOME</span><span style=\"color:#A5D6FF\">/.bashrc.local\"</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">fi</span></span></code></pre></div><p>Then ignore the local file.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span>.bashrc.local</span></span></code></pre></div><p>This keeps the shared file stable while allowing a machine to carry private or temporary settings.</p>\n<h2>Make bootstrap boring</h2>\n<p>A new machine should need only a short, documented bootstrap path. Keep the first run safe and repeatable.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> clone</span><span style=\"color:#A5D6FF\"> https://example.com/dotfiles.git</span><span style=\"color:#A5D6FF\"> ~/dotfiles</span></span>\n<span class=\"line\"><span style=\"color:#79C0FF\">cd</span><span style=\"color:#A5D6FF\"> ~/dotfiles</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">stow</span><span style=\"color:#79C0FF\"> --target=</span><span style=\"color:#A5D6FF\">\"</span><span style=\"color:#E6EDF3\">$HOME</span><span style=\"color:#A5D6FF\">\"</span><span style=\"color:#A5D6FF\"> shell</span><span style=\"color:#A5D6FF\"> git</span></span></code></pre></div><p>For chezmoi, use its init and apply workflow.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">chezmoi</span><span style=\"color:#A5D6FF\"> init</span><span style=\"color:#A5D6FF\"> https://example.com/dotfiles.git</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">chezmoi</span><span style=\"color:#A5D6FF\"> apply</span></span></code></pre></div><p>Do not let bootstrap scripts install large amounts of software without confirmation. Configuration restore and package installation are related, but they have different failure modes.</p>\n<h2>Review changes before applying them</h2>\n<p>Dotfile changes can break login shells, editors and Git authentication. Review generated changes before applying them, especially on a remote host.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> diff</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">stow</span><span style=\"color:#79C0FF\"> --no</span><span style=\"color:#79C0FF\"> --verbose</span><span style=\"color:#79C0FF\"> --target=</span><span style=\"color:#A5D6FF\">\"</span><span style=\"color:#E6EDF3\">$HOME</span><span style=\"color:#A5D6FF\">\"</span><span style=\"color:#A5D6FF\"> shell</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">chezmoi</span><span style=\"color:#A5D6FF\"> diff</span></span></code></pre></div><p>A dry run is valuable because dotfiles usually affect the interactive recovery tools you depend on. Breaking an editor is inconvenient. Breaking a shell startup file on a remote server can be worse.</p>\n<h2>Keep configuration portable, not identical</h2>\n<p>The goal is repeatability, not forcing every machine to be the same. A workstation, a laptop and a server have different roles. Dotfile tooling should make shared defaults easy and local differences obvious.</p>\n<p>Prefer small files, comments for non-obvious choices and documented bootstrap commands. Remove old settings when the tool they configure is no longer used.</p>\n<h2>Conclusion</h2>\n<p>Good dotfile management is simple configuration management. Put safe, useful settings in version control, keep secrets out, choose Stow for straightforward symlink workflows and choose chezmoi when machines need templated differences. The best setup is the one you can review before applying and rebuild without relying on memory.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "cli",
        "git",
        "devops"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/managing-secrets-in-applications",
      "url": "https://soulstack.co.uk/blog/managing-secrets-in-applications",
      "title": "Managing secrets in applications",
      "summary": "Application secrets include API keys, database credentials, signing keys, private keys, service account tokens and certificates. They need deliberate management because one leaked…",
      "content_html": "<p>Application secrets include API keys, database credentials, signing keys, private keys, service account tokens and certificates. They need deliberate management because one leaked secret can give an attacker direct access to systems and data. The work is a lifecycle: identify a secret, store it safely, retrieve it through a managed system, rotate it on a schedule, and revoke it fast when something goes wrong.</p>\n<h2>Know what counts as a secret</h2>\n<p>A secret is any value that grants access, signs data, decrypts data or proves identity. Common examples include database passwords, OAuth client secrets, cloud access keys, webhook signing secrets, SSH keys, TLS private keys and encryption keys.</p>\n<p>Configuration values are not all secrets. A public URL or feature flag may belong in ordinary configuration. A token that can reach production data does not. Classify secrets clearly so they get stronger handling than ordinary settings.</p>\n<h2>Do not hardcode secrets</h2>\n<p>Do not put secrets in source code, container images, client side bundles, test fixtures or documentation examples. Source control keeps history, so deleting a committed secret from the latest version does not remove it. The value still sits in earlier commits.</p>\n<p>When a secret is committed, treat it as exposed and rotate it. Rewriting history with a tool like git filter-repo or BFG Repo-Cleaner addresses hygiene and compliance, but it does not undo the exposure on its own. Rotation is what actually removes the risk.</p>\n<p>Use secret scanning in the IDE, in source control and in CI to catch mistakes early. Scanning is a safety net, not permission to store secrets in code.</p>\n<h2>Use a secrets manager</h2>\n<p>A secrets manager centralises storage, access control, auditing and rotation. It also gives applications a consistent way to retrieve secrets at runtime instead of reading them from scattered files.</p>\n<p>Choose a retrieval design that fits the runtime. Server applications can fetch secrets from a managed service or a local agent, and a sidecar that refreshes the value periodically keeps a valid credential in memory without a redeploy. CI jobs can request short lived credentials for a single run. Client side code should not receive server side secrets at all.</p>\n<p>Avoid spreading production secrets through environment files, chat messages or manual setup notes. Those channels are hard to audit and hard to clean up.</p>\n<h2>Prefer short lived credentials</h2>\n<p>Static credentials are easy to copy and hard to contain. Prefer short lived or dynamic credentials issued by identity based access, workload identity or federation. A dynamic database credential, for example, expires on its own, so a stolen value stops working without any manual cleanup.</p>\n<p>Short lived credentials reduce the useful life of a leak and make rotation less disruptive, because the application already expects renewal. Pair them with least privilege so each credential carries only the access its workload needs.</p>\n<p>For long lived secrets that cannot be removed, document ownership, purpose, expiry, rotation steps and emergency revocation steps.</p>\n<h2>Rotate and revoke safely</h2>\n<p>Rotation is the planned replacement of a secret. Revocation is the removal of trust from a secret, usually during an incident.</p>\n<p>Design applications so rotation is routine. That may mean supporting multiple active keys during a transition, reloading secrets without a full deployment, and testing the rollback path before you need it.</p>\n<p>When a secret is suspected to be exposed, revoke or rotate it promptly, then check logs for use of the old value. Log attempts to use a revoked secret so a leak shows up after the fact.</p>\n<h2>Protect secrets in logs and telemetry</h2>\n<p>Secrets often leak through error messages, debug logs, request dumps, crash reports and analytics events. A secrets manager does not help if the application prints the value on the next line.</p>\n<p>Redact or mask known secret fields before logging. Avoid logging full headers, connection strings, tokens or signed URLs. Treat observability pipelines as sensitive systems, because they may receive secrets by accident.</p>\n<h2>Conclusion</h2>\n<p>Secrets management is a lifecycle problem, not a place to paste passwords. Identify what counts as a secret, keep it out of code, retrieve it through a managed system, prefer short lived credentials, make rotation a normal operation, and keep secrets out of your logs.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "security",
        "devops",
        "architecture"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/metrics-logs-and-traces-the-three-pillars",
      "url": "https://soulstack.co.uk/blog/metrics-logs-and-traces-the-three-pillars",
      "title": "Metrics, logs and traces: the three pillars",
      "summary": "Metrics, logs and traces are three different ways to understand a running system. They overlap, but they answer different questions and work best when designed together rather tha…",
      "content_html": "<p>Metrics, logs and traces are three different ways to understand a running system. They overlap, but they answer different questions and work best when designed together rather than bolted on one at a time.</p>\n<h2>What each signal is for</h2>\n<p>Metrics are measurements captured over time. They are compact, cheap to query, and well suited to dashboards, alerting, trends, and service level objectives.</p>\n<p>Logs are records of events. They explain decisions, state changes, failures, and security relevant activity. They are useful when an engineer needs detail that is too specific or too rare to capture in a metric.</p>\n<p>Traces show the path of work through a system. They are especially useful in distributed systems where one user request crosses many services, queues, and dependencies.</p>\n<p>A mature setup uses all three. Metrics show that something is wrong. Traces help locate where the work slowed down or failed. Logs explain what happened at a specific point.</p>\n<h2>Metrics answer how much and how often</h2>\n<p>Metrics are the best signal for service health. A practical baseline for any user facing path is the four golden signals: latency, traffic, errors, and saturation.</p>\n<p>Use metrics for questions such as:</p>\n<ul>\n<li>What proportion of requests failed?</li>\n<li>How long did successful requests take?</li>\n<li>Is the queue growing?</li>\n<li>Is the database connection pool exhausted?</li>\n<li>Are we close to a quota or capacity limit?</li>\n</ul>\n<p>Use labels carefully. Labels such as service, route, method, outcome, region, and dependency are often useful. Labels such as user identifier, request identifier, order identifier, or full URL can create unbounded cardinality and damage the monitoring system.</p>\n<p>Metrics should be stable enough to alert on. If a metric name or label changes every release, it is not a reliable operational contract.</p>\n<h2>Logs answer what happened</h2>\n<p>Logs are best for discrete events. A useful entry records a meaningful fact: a request completed, a job failed permanently, a dependency returned an unexpected response, or a security decision was made.</p>\n<p>Use logs for questions such as:</p>\n<ul>\n<li>Which resource was affected?</li>\n<li>What did the service decide to do?</li>\n<li>Which dependency returned the error?</li>\n<li>Was the request rejected by validation, authentication, or authorisation?</li>\n<li>What retry or fallback path was used?</li>\n</ul>\n<p>Keep logs structured and include the trace identifier so a log line can lead back to a trace. That is enough for this post. How to make individual log entries genuinely useful is its own topic.</p>\n<h2>Traces answer where time and failure moved</h2>\n<p>A trace follows one unit of work across process boundaries. Each span represents an operation within that trace. A trace can show that a request spent most of its time in a downstream API, a database query, a queue wait, or application code.</p>\n<p>Use traces for questions such as:</p>\n<ul>\n<li>Which service added latency?</li>\n<li>Which dependency failed first?</li>\n<li>Did retries make the request slower?</li>\n<li>Did parallel calls run as expected?</li>\n<li>Which path did this request take through the system?</li>\n</ul>\n<p>Traces are only useful when context is propagated consistently across services. The W3C Trace Context standard defines the traceparent and tracestate HTTP headers for exactly this, so that every compliant tool can read and forward the same trace identity. Without propagation, traces break at service boundaries and become isolated fragments.</p>\n<h2>Correlation matters more than volume</h2>\n<p>Collecting all three signals is not enough. They must share enough context to be used together.</p>\n<p>The most important link is a trace or request identifier. A dashboard should lead to a trace. A trace should lead to relevant logs. Logs should include the identifiers needed to find related metrics, resources, and deployments.</p>\n<p>Standard names also help. Shared conventions reduce the translation work between teams, libraries, and tools. The goal is not to make every service identical. The goal is to make the common parts predictable.</p>\n<h2>Alert from symptoms, investigate with detail</h2>\n<p>Alerts should usually come from user visible symptoms, not from every internal cause. A high error rate, a failed availability objective, rising tail latency, or exhausted critical capacity is worth attention because it describes impact or imminent impact.</p>\n<p>Logs and traces are usually better for investigation than paging. A single error log line rarely proves user impact. A failed internal call might be retried successfully. A slow dependency might affect only a low priority background job.</p>\n<p>Use the alert to bring someone to the problem. Use logs and traces to help them solve it.</p>\n<h2>Avoid common design mistakes</h2>\n<p>Do not use logs as metrics. Counting log lines is fragile because log volume changes with code paths, sampling, and level configuration.</p>\n<p>Do not use metrics as forensic records. A counter can show that failures increased, but it cannot explain the exact request, actor, or decision.</p>\n<p>Do not use traces as a replacement for service health monitoring. Sampling, retention, and backend cost usually make traces unsuitable as the only source for alerting.</p>\n<p>Do not collect high volume telemetry without an owner. Every signal needs a reason to exist, a retention policy, and a review path when cost or noise grows.</p>\n<h2>A practical starting set</h2>\n<p>For a typical HTTP service, start with:</p>\n<ul>\n<li>request count by route, method, status class, and outcome</li>\n<li>request latency by route and outcome</li>\n<li>dependency latency and error count by dependency and operation</li>\n<li>saturation metrics for CPU, memory, queues, workers, and connection pools</li>\n<li>structured logs for request completion, permanent job failure, security decisions, and dependency failures</li>\n<li>traces for inbound requests and significant outbound calls</li>\n</ul>\n<p>Then add service specific signals only when they answer a real operational question.</p>\n<h2>Conclusion</h2>\n<p>Metrics, logs and traces work best as a connected system. Metrics show the health of the service, traces show the path of work, and logs explain the events and decisions. Design them together, correlate them with stable identifiers, and alert on symptoms that matter to users.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "reliability",
        "devops",
        "architecture"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/pagination-filtering-and-sorting-done-right",
      "url": "https://soulstack.co.uk/blog/pagination-filtering-and-sorting-done-right",
      "title": "Pagination, filtering and sorting done right",
      "summary": "Pagination, filtering and sorting are part of the API contract. Treat them as first-class design choices, because small inconsistencies in collection endpoints quickly become expe…",
      "content_html": "<p>Pagination, filtering and sorting are part of the API contract. Treat them as first-class design choices, because small inconsistencies in collection endpoints quickly become expensive client bugs.</p>\n<h2>Paginate every growing collection</h2>\n<p>Any collection that can grow beyond a small bounded size should be paginated from the start. Pagination protects the service from unbounded reads and protects clients from large responses, timeouts, and memory pressure.</p>\n<p>Do not add pagination only after the first large customer arrives. Retrofitting pagination is a breaking change when clients already expect a complete list from one request.</p>\n<h2>Prefer cursor pagination for changing data</h2>\n<p>Offset pagination is simple: the client asks for <code>offset=100&amp;limit=50</code> or <code>page=3&amp;pageSize=50</code>. It works for small, stable collections and admin screens, but it can produce duplicates or gaps when records are inserted or removed while the client is paging.</p>\n<p>Cursor pagination uses an opaque token that represents a position in a stable ordering. It is better for active datasets, event streams, feeds, and high-volume collections. The token should be treated as an implementation detail. Clients should store and send it, not parse it.</p>\n<h2>Use deterministic ordering</h2>\n<p>Pagination is only reliable when ordering is deterministic. If the client sorts by a non-unique field, add a stable tie-breaker internally, such as the resource identifier or creation timestamp plus identifier. Without a tie-breaker, two rows with the same sort value can move between pages.</p>\n<p>Document the default ordering. Do not change it silently. A default sort change can break clients that process pages incrementally.</p>\n<h2>Keep filters explicit and bounded</h2>\n<p>Filtering should use documented fields and operators. Avoid accepting arbitrary database expressions, raw SQL fragments, or implementation-specific field names. Those designs increase injection risk, leak internals, and make index planning difficult.</p>\n<p>Support the filters clients actually need. Common examples are equality filters, date ranges, status values, ownership, and prefix search. For each filter, document the field, type, allowed operators, case sensitivity, time zone handling, and whether multiple values mean AND or OR.</p>\n<p>Return <code>400</code> for unsupported filters rather than ignoring them. Silent ignore behaviour makes clients believe they are seeing a narrowed result when they are not.</p>\n<h2>Make sorting predictable</h2>\n<p>Use a documented sort syntax and keep it consistent across collection endpoints. A compact approach is <code>sort=createdAt,-id</code>, where a leading hyphen means descending order. Another approach is separate fields such as <code>orderBy=createdAt desc,id desc</code>. Either can work. The important part is that it is documented, validated, and consistent.</p>\n<p>Reject unsupported sort fields. Do not pass sort input directly to a database query. Map public sort names to approved internal columns or expressions.</p>\n<h2>Return navigation information</h2>\n<p>A paginated response should tell the client how to get the next page. Cursor APIs can return <code>nextPageToken</code> in the body, a <code>Link</code> header with <code>rel=&quot;next&quot;</code>, or both. Header links are standard HTTP, while body tokens are often easier for SDKs and application developers. Whichever approach you choose, make it consistent.</p>\n<p>Avoid requiring clients to construct the next URL by copying all filters, sort fields, and cursor state themselves. The server knows the correct next position. Give it back to the client.</p>\n<h2>Be careful with totals</h2>\n<p>A total count can be useful, but it is not always cheap or stable. Counting a large filtered dataset can be more expensive than returning a page. In active datasets, the count can change before the client reaches the next page.</p>\n<p>If totals are provided, document whether they are exact, approximate, or omitted for expensive queries. Do not make every page wait for an exact count unless the product requirement justifies the cost.</p>\n<h2>Preserve query consistency across pages</h2>\n<p>The filters, sort order, page size, and authorisation context must remain consistent across page requests. If a cursor token encodes query state, reject attempts to reuse it with different filters or sort options. If it does not encode query state, clients must resend the same query parameters with the cursor.</p>\n<p>Do not let clients change sort order halfway through a cursor sequence. That creates unclear boundaries and duplicate processing.</p>\n<h2>Design for limits and failure</h2>\n<p>Set a default page size and a maximum page size. Return clear validation errors when the requested size is too large. Use stable error formats so clients can distinguish invalid query parameters from temporary server failures.</p>\n<p>For expensive filters, prefer an explicit <code>400</code> or <code>422</code> validation error over timing out. If a query shape is unsupported at scale, say so in the contract.</p>\n<h2>Conclusion</h2>\n<p>Good collection design is not just <code>limit</code> and <code>offset</code>. It requires bounded page sizes, stable ordering, documented filters, validated sorting, reliable navigation, and clear behaviour around totals. Cursor pagination is usually the safer default for changing data, while offset pagination is best reserved for small or stable datasets.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "api",
        "architecture",
        "web"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/platform-engineering-is-a-product-problem-not-a-kubernetes-problem",
      "url": "https://soulstack.co.uk/blog/platform-engineering-is-a-product-problem-not-a-kubernetes-problem",
      "title": "Platform engineering is a product problem, not a Kubernetes problem",
      "summary": "Platform engineering fails when it is treated as a tooling programme. Kubernetes, Terraform, pipelines and portals can be useful, but they are not the product. The product is a re…",
      "content_html": "<p>Platform engineering fails when it is treated as a tooling programme. Kubernetes, Terraform, pipelines and portals can be useful, but they are not the product. The product is a reliable path from idea to production that development teams can use without needing to understand every internal system behind it.</p>\n<h2>The platform is the interface, not the estate</h2>\n<p>A platform should reduce the number of decisions a product team must make before it can ship safely. That does not mean hiding every detail. It means exposing stable, documented capabilities with clear ownership, clear support boundaries and sensible defaults.</p>\n<p>The common failure mode is building a catalogue of infrastructure components and calling it a platform. A catalogue is only useful when it is shaped around user journeys. A developer does not wake up wanting a cluster, an ingress rule or an IAM policy. They want to create a service, deploy it, observe it, recover it and change it without waiting on another team.</p>\n<p>That is a product problem. The platform team needs users, adoption metrics, feedback loops, roadmaps, deprecation plans and support data. Without those, it is just another internal toolchain.</p>\n<h2>Kubernetes is an implementation detail</h2>\n<p>Kubernetes can be a strong substrate for a platform, but it should rarely be the primary developer experience. If every team must become fluent in Kubernetes primitives before shipping a simple service, the platform has moved complexity sideways rather than reducing it.</p>\n<p>The right abstraction depends on the organisation. A team building low level infrastructure may need direct access to Kubernetes APIs. A product team building a routine web service may need a service template, a deployment contract, runtime limits, observability defaults and a clear way to request exceptions.</p>\n<p>The platform should make the common path easy and the unsafe path visible. It should not pretend that one interface suits every workload.</p>\n<h2>Treat adoption as evidence</h2>\n<p>A platform is only working when teams choose it because it is faster, safer and more predictable than doing the work themselves. Mandated adoption can hide bad design. Voluntary adoption exposes whether the platform solves a real problem.</p>\n<p>Useful signals include time to first deploy, deployment frequency, failed deployment recovery time, support tickets per service, repeated manual steps and policy exception volume. Read alongside the share of services using the supported path, these signals show whether the platform is genuinely reducing work.</p>\n<p>Those metrics should not become vanity reporting. They should feed backlog decisions. If teams keep bypassing the same capability, the response should be product discovery, not blame.</p>\n<h2>Build opinionated paths, not cages</h2>\n<p>A supported default path is valuable when it encodes a good decision once so that teams do not have to make it again. It becomes a cage when it blocks valid needs or forces teams into unsuitable patterns. The practical mechanics of building these paths deserve their own treatment, so this is only a brief mention here.</p>\n<p>The platform should separate standards from preferences. Standards cover security, ownership, logging, auditability, deployment traceability, rollback and operational readiness. Preferences cover framework choice, internal library style and implementation taste. Confusing the two creates unnecessary friction.</p>\n<p>Exceptions should be designed into the system. A mature platform can say: this is the supported path, this is the review process for leaving it, and this is what support changes when you do.</p>\n<h2>Platform teams need product discipline</h2>\n<p>Product discipline means writing down who the users are, what jobs they need done, what is supported, what is not supported and how success is measured. It also means saying no.</p>\n<p>A platform team that accepts every request becomes a shared services queue. A platform team that only builds for its own technical interests becomes irrelevant. The useful middle is a team that finds repeated problems, turns them into reusable capabilities and removes work from product teams without removing ownership from them.</p>\n<h2>Conclusion</h2>\n<p>Platform engineering is not a Kubernetes adoption strategy. It is a product strategy for internal engineering capability. The best platform teams do not optimise for how much infrastructure they expose. They optimise for how much safe, repeatable delivery they make possible.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "devops",
        "architecture",
        "reliability"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/platform-team-guide-to-golden-paths-developers-do-not-hate",
      "url": "https://soulstack.co.uk/blog/platform-team-guide-to-golden-paths-developers-do-not-hate",
      "title": "The platform team guide to golden paths that developers do not hate",
      "summary": "A golden path is meant to make the right way the easy way. Developers come to resent golden paths when they feel like paperwork, hide details that matter, block valid use cases, o…",
      "content_html": "<p>A golden path is meant to make the right way the easy way. Developers come to resent golden paths when they feel like paperwork, hide details that matter, block valid use cases, or seem to exist mainly so the platform team can claim everything is standardised.</p>\n<h2>Start with a real journey</h2>\n<p>A golden path should map to a complete developer journey, not a tool category. A real journey includes creating a service, deploying a change, adding a dependency, exposing an endpoint, viewing logs, rolling back, rotating a secret, and responding to an alert.</p>\n<p>If a path only creates a repository and leaves the team to discover the rest, it is a template, not a golden path. The opinionated, supported route should carry a team through the work it actually has to do, not just the first ten minutes of it.</p>\n<h2>Remove decisions, not ownership</h2>\n<p>The value of a golden path is reducing repeated decisions. It should provide sensible defaults for build, test, deploy, logging, metrics, security checks, ownership metadata, and operational readiness, so each team is not relitigating the same setup.</p>\n<p>It should not remove engineering ownership. Teams still own their service behaviour, their dependencies, their reliability targets, and their production outcomes. The path supplies paved roads. It does not turn product teams into passengers who can shrug off what happens after deploy.</p>\n<h2>Make the contract visible</h2>\n<p>Developers need to know what the platform actually promises. Is the path supported in production? Which languages are maintained? How are updates rolled out? What happens when the generated template changes underneath them? What support does a team lose if it modifies the path?</p>\n<p>Without a visible contract, teams either over-trust the path and assume guarantees that were never offered, or they avoid it entirely because they cannot tell what they are signing up for. A written, honest contract is what separates a golden path from a suggestion.</p>\n<h2>Design the escape hatch</h2>\n<p>No golden path covers every valid workload. The escape hatch should be documented, reviewed, and observable. A team should know how to request an exception, what evidence is required, and how the decision affects the support it receives.</p>\n<p>This prevents two bad outcomes. The first is shadow platforms built in frustration when teams feel they have no sanctioned way out. The second is platform sprawl, where every special case gets absorbed into the default path until the default no longer means anything. A clear exception process keeps both in check.</p>\n<h2>Keep paths alive</h2>\n<p>A golden path is a maintained product, not a one-off scaffold. It needs versioning, migration support, a deprecation policy, and usage telemetry. The first version is rarely the hard part. The hard part is keeping hundreds of services aligned without breaking teams and without freezing the platform in place.</p>\n<p>Automated upgrades help only when they are predictable. Teams need release notes, changes they can test before adopting, and a way to defer a high-risk update briefly without falling off support forever. Trust in the path erodes fast the first time an upgrade lands unannounced and breaks production.</p>\n<h2>Measure friction honestly</h2>\n<p>Adoption alone is not enough. A mandated path can show high adoption and low trust at the same time. More honest signals include time to first production deployment, support requests per service, manual steps per release, exception rate, upgrade lag, and developer satisfaction from targeted surveys rather than vague sentiment.</p>\n<p>Some of the most useful feedback comes from teams that rejected the path. They can show exactly where the default failed a real need, which is the information you need to improve it.</p>\n<h2>Conclusion</h2>\n<p>Developers do not hate standards. They hate standards that slow them down without improving outcomes. A golden path works when it covers a real journey, states its contract, supports exceptions, and stays maintained. The path is a success when teams choose it because it makes delivery easier and safer, not because they were told they had no other option.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "devops",
        "architecture",
        "reliability"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/principle-of-least-privilege-in-practice",
      "url": "https://soulstack.co.uk/blog/principle-of-least-privilege-in-practice",
      "title": "Principle of least privilege in practice",
      "summary": "The principle of least privilege means every user, service and process should have only the access it needs to do its intended work, and nothing more. In practice that means desig…",
      "content_html": "<p>The principle of least privilege means every user, service and process should have only the access it needs to do its intended work, and nothing more. In practice that means designing access so the safe default is no access, then granting narrow permissions for specific tasks. Done well, it shrinks the blast radius when a credential leaks, a service is compromised or someone makes a mistake.</p>\n<h2>Start with denial by default</h2>\n<p>Access control should start closed. A new user, service account, API route or job should receive no sensitive access until a deliberate rule grants it. This is the model OWASP recommends for avoiding broken access control: except for genuinely public resources, deny by default.</p>\n<p>A closed default is easier to reason about than broad access with exceptions. Exceptions accumulate over time, and a missing block in a deny list is silent. A missing grant in an allow list simply fails, which is the behaviour you want. Prefer allow lists for sensitive actions over rules that permit everything except a handful of blocked cases.</p>\n<h2>Separate users, services and environments</h2>\n<p>Different actors need different permissions, so do not collapse them onto one credential. A human administrator, a web application, a background worker and a deployment job each have a distinct job, and each should carry its own identity.</p>\n<p>Give separate workloads separate service accounts. Give staging, test and production their own credentials and resources, so a mistake in a non-production environment cannot reach production. Where the platform supports it, separate read, write and administrative permissions rather than handing out a single all-powerful role.</p>\n<p>The payoff is containment. When one identity is misused, the damage is bounded by what that identity could already do, not by what the whole system can do.</p>\n<h2>Grant permissions for tasks, not convenience</h2>\n<p>Permissions should map to a real task. If a service only reads from one bucket, it should not have write access to every bucket. If a support role only needs to view account status, it should not be able to change billing details.</p>\n<p>Convenience access is dangerous because it outlives its reason. Someone grants broad access to unblock a deadline, the deadline passes, and the grant stays. Give temporary access an explicit expiry and a named owner so it is removed on a schedule rather than forgotten.</p>\n<h2>Apply least privilege in code</h2>\n<p>Least privilege is not only an infrastructure control. Application code needs it too, and access control is only effective when it runs in trusted server-side code where an attacker cannot tamper with the check.</p>\n<p>Authorise every protected action on the server. Keep administrative functions behind explicit permission checks, and do not assume that a user who can load a page is allowed to call every action behind it. Hiding a button is not access control.</p>\n<p>Enforce object-level checks. A user may be allowed to read invoices, but only invoices that belong to their account or tenant. Verifying record ownership on every lookup is the defence against the most common form of broken access control, where an identifier from user input is trusted without checking who owns the record.</p>\n<h2>Apply least privilege to data</h2>\n<p>Data access should be narrow. Components usually need only a subset of tables, schemas, buckets or queues, not unrestricted access to the store.</p>\n<p>Give each component its own database role scoped to what it actually does. A reporting job may need read-only access. A migration job may need to change schema. The main application should rarely need unrestricted administrative database access at runtime, because that level of access turns a single application bug into a route to the entire database. Scoping the role keeps the blast radius small.</p>\n<h2>Review access regularly</h2>\n<p>Permissions drift. People change roles, services are retired, and emergency access quietly becomes permanent. Least privilege is a state you maintain, not a setting you apply once.</p>\n<p>Review high-risk permissions on a schedule. Remove unused roles, stale accounts and old service credentials, and compare granted access against actual use in your logs to find permissions no one exercises. Automated detection helps here: flag wildcard permissions, inactive accounts and shared credentials before they become the path an attacker takes.</p>\n<h2>Design for break-glass access</h2>\n<p>Least privilege should not block incident response. Create a controlled break-glass process for genuine emergencies, so that tightening day-to-day access does not leave responders stuck.</p>\n<p>Break-glass access should be rare, monitored, time-limited and reviewed after every use. It is an audited exception, not the normal way to work around missing permissions. If people reach for it routinely, that is a signal that the everyday grants are too narrow in the wrong places, and that is what to fix.</p>\n<h2>Conclusion</h2>\n<p>Least privilege reduces the blast radius of mistakes and compromises. Start with denial by default, separate identities for users, services and environments, grant access for specific tasks rather than convenience, enforce server-side and object-level checks in code, scope data access per component, and keep reviewing permissions as the system changes. Pair it with a tightly controlled break-glass path so security and incident response can coexist.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "security",
        "architecture",
        "web"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/queues-and-background-jobs-the-basics",
      "url": "https://soulstack.co.uk/blog/queues-and-background-jobs-the-basics",
      "title": "Queues and background jobs: the basics",
      "summary": "Queues move work out of the synchronous request path and into worker processes. They are useful when work is slow, bursty, retryable, or not required before the user receives a re…",
      "content_html": "<p>Queues move work out of the synchronous request path and into worker processes. They are useful when work is slow, bursty, retryable, or not required before the user receives a response. They are not a shortcut for reliability. A queue adds another distributed system, with delivery semantics, backpressure, monitoring, and failure handling.</p>\n<h2>Core terms</h2>\n<p>A producer creates a message. A broker stores or routes the message. A queue holds messages until workers can process them. A consumer or worker receives a message and performs the task. An acknowledgement tells the broker that the message has been handled.</p>\n<p>The unit of work should be small enough to retry safely and large enough to be meaningful. A job that does too much is hard to recover. A job that does too little can create coordination overhead and excessive queue traffic.</p>\n<h2>Why use a queue</h2>\n<p>A queue is useful when the caller does not need the result immediately. Common examples include sending notifications, generating reports, importing data, resizing images, synchronising with another system, and running maintenance tasks.</p>\n<p>Queues also smooth bursts. A web tier might receive a spike of requests, enqueue work quickly, and let workers drain the backlog at a controlled rate. This protects dependencies from sudden load, but only if the queue length, worker concurrency, and downstream capacity are managed.</p>\n<h2>Delivery is not exactly once</h2>\n<p>Most practical job systems should be treated as at-least-once delivery. A job can run more than once. A worker can crash after the side effect but before acknowledgement. A timeout can make the broker deliver the same message again. A deployment can interrupt a worker.</p>\n<p>Make jobs idempotent so that running the same message twice does not corrupt state. Lean on stable business identifiers, database uniqueness constraints, and state checks before acting. Do not rely on the queue to prevent all duplicates. The detailed contract for idempotency keys on external calls is its own subject and is covered separately.</p>\n<h2>Acknowledgements</h2>\n<p>Acknowledgement timing changes failure behaviour. If a worker acknowledges before doing the work, a crash can lose the job. If it acknowledges after doing the work, a crash can cause redelivery and duplicate execution. Late acknowledgement is safer only when the job is idempotent.</p>\n<p>Workers should handle shutdown deliberately. On termination, a worker should stop accepting new work, finish or abandon current work according to the broker contract, and leave the system in a state where incomplete work can be retried.</p>\n<h2>Durability</h2>\n<p>Durability needs both broker configuration and message publishing behaviour. A durable queue alone is not enough if messages are published in a non-durable way. A persistent message alone is not enough if the broker is configured to discard the queue.</p>\n<p>Durability also has cost. Persisting messages, confirming publishes, and replicating queues increase latency and resource use. Choose durability based on business impact. A cache refresh job and a payment capture job do not need the same guarantees.</p>\n<h2>Concurrency and ordering</h2>\n<p>Increasing workers improves throughput only until another constraint becomes the bottleneck. The database, a rate-limited API, a filesystem, or the broker itself can become the limiting dependency.</p>\n<p>Ordering is fragile under concurrency. If strict ordering matters, design for it explicitly. That may mean partitioning by key, using a single worker for a stream, or storing state transitions in a database and rejecting invalid transitions. Do not assume global ordering in a general purpose queue.</p>\n<h2>Backpressure</h2>\n<p>A growing queue is a signal that producers are adding work faster than workers can complete it. The right response might be to add workers, reduce producer rate, shed low priority work, pause a feature, or fix a slow dependency. Adding workers blindly can make the dependency fail faster.</p>\n<p>Track queue depth, message age, processing duration, failure rate, retry rate, dead letter count, and worker saturation. Message age is often more useful than queue length because it shows user visible delay.</p>\n<h2>Retries and dead letters</h2>\n<p>Retries should be bounded and delayed. Immediate retry can create a hot loop that repeatedly fails the same job. Use backoff for transient failures. Do not retry validation errors or permanent business rule failures without a change.</p>\n<p>After the retry limit, move the job to a dead letter queue or failure store with enough context for investigation. Operators need the job type, safe identifiers, error class, attempt count, and timestamps. They also need a documented way to replay or discard failed jobs.</p>\n<h2>Payload design</h2>\n<p>Keep job payloads small and stable. Store identifiers rather than large object graphs. A worker can load current state from the database and decide whether the job is still needed. This avoids stale serialised objects and reduces broker memory pressure.</p>\n<p>Avoid putting secrets or unnecessary personal data in messages. Queues often have different retention, access, and logging paths from application databases.</p>\n<h2>When not to use a queue</h2>\n<p>Do not use a queue when the caller needs a confirmed result before continuing. Do not use a queue to hide slow database queries that still need to be fixed. Do not use a queue when the business operation requires a synchronous transaction and there is no safe compensating action.</p>\n<p>A queue changes the user experience. The system must expose pending, completed, and failed states where the user or another system needs to know what happened.</p>\n<h2>Conclusion</h2>\n<p>Queues are a basic building block for responsive and resilient systems, but they require disciplined design. Treat delivery as repeatable, make jobs idempotent, control concurrency, monitor message age, and provide a clear failure path. The queue should make work manageable, not invisible.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "architecture",
        "reliability",
        "performance"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/rate-limiting-algorithms-and-trade-offs",
      "url": "https://soulstack.co.uk/blog/rate-limiting-algorithms-and-trade-offs",
      "title": "Rate limiting: algorithms and trade-offs",
      "summary": "Rate limiting protects an API from overload, abusive clients, accidental loops, and expensive spikes. The hard part is not adding a limit. The hard part is choosing an algorithm a…",
      "content_html": "<p>Rate limiting protects an API from overload, abusive clients, accidental loops, and expensive spikes. The hard part is not adding a limit. The hard part is choosing an algorithm and a response contract that fit the product, the traffic pattern, and the fairness model.</p>\n<h2>Decide what is being limited</h2>\n<p>Start with the unit of control. Limits can apply by IP address, authenticated user, organisation, API key, OAuth application, endpoint, tenant, region, or a combination. Public unauthenticated APIs often begin with IP-based limits, but IPs are a weak identity behind NAT, mobile networks, proxies, and attackers.</p>\n<p>Authenticated APIs should usually limit by principal and by application. Multi-tenant APIs often need tenant-wide limits as well, because one customer can create many users or tokens. Expensive endpoints may need stricter per-operation limits than cheap reads.</p>\n<h2>Fixed window counters</h2>\n<p>A fixed window counter allows a set number of requests per time window, such as 100 requests per minute. It is simple, cheap, and easy to explain. The downside is boundary burst. A client can send 100 requests at the end of one minute and 100 more at the start of the next minute.</p>\n<p>Fixed windows are useful for coarse quotas and low-risk APIs. They are less suitable when sharp bursts can overload dependencies.</p>\n<h2>Sliding window logs</h2>\n<p>A sliding window log stores request timestamps and counts only those within the current rolling period. It is accurate and avoids fixed boundary bursts. The cost is storage and cleanup, especially for high-cardinality identities and high request rates.</p>\n<p>Use sliding logs when precision matters and request volume is moderate. For very high volume systems, the memory and write overhead can become the limiting factor.</p>\n<h2>Sliding window counters</h2>\n<p>A sliding window counter approximates a rolling window by combining the current and previous fixed windows with weighting. It is cheaper than a full timestamp log and smoother than a basic fixed window.</p>\n<p>The trade-off is approximation. For most APIs, that approximation is good enough. It gives better burst control without storing every request timestamp.</p>\n<h2>Token bucket</h2>\n<p>A token bucket refills at a steady rate up to a maximum capacity. Each request consumes one or more tokens. When the bucket is empty, the request is limited. The capacity controls burst size. The refill rate controls sustained throughput.</p>\n<p>Token buckets are a strong default for APIs because they allow short bursts while enforcing a long-term rate. They also support weighted costs, where an expensive request consumes more tokens than a cheap request.</p>\n<h2>Leaky bucket</h2>\n<p>A leaky bucket processes requests at a steady rate, often through a queue. Bursts are smoothed into a constant drain rate until the queue fills. Once full, new requests are rejected or delayed.</p>\n<p>Leaky buckets are useful when a downstream system needs smooth traffic. They can add latency because requests wait in a queue. They are a poor fit for interactive APIs when clients expect immediate responses and retry behaviour is easier than server-side waiting.</p>\n<h2>Concurrency limits</h2>\n<p>Rate limits count requests over time. Concurrency limits cap work in progress. They are useful for expensive operations, long polling, report generation, uploads, and endpoints that hold database connections or worker slots.</p>\n<p>Concurrency limits should usually be combined with rate limits. A client can stay under a per-minute quota and still create too many simultaneous expensive requests.</p>\n<h2>Return useful limit responses</h2>\n<p>When a request is limited, return <code>429 Too Many Requests</code>. Include a clear error body and a retry signal. The <code>Retry-After</code> header can tell the client when to try again. Limit headers can expose remaining quota and reset times, but they must be accurate enough for clients to rely on.</p>\n<p>Do not return <code>500</code> for deliberate rate limiting. That makes clients treat a policy decision as a server fault.</p>\n<h2>Make clients part of the design</h2>\n<p>Good clients back off, add jitter, respect <code>Retry-After</code>, avoid polling when webhooks are available, cache where allowed, and stop retrying non-retryable errors. Document this behaviour. SDKs should implement it by default.</p>\n<p>Do not encourage clients to race the limit. If every response exposes an exact reset second, clients may stampede at the boundary. Jitter and token-based smoothing reduce that effect.</p>\n<h2>Protect fairness and cost</h2>\n<p>Rate limiting is a security, reliability, and cost control. OWASP classifies unrestricted resource consumption as an API security risk because API requests consume CPU, memory, network bandwidth, and storage, and sometimes paid third-party resources such as email and SMS.</p>\n<p>Set stricter limits for high-cost operations, authentication attempts, exports, search, webhook retries, and endpoints that trigger email, SMS, payment, or AI workloads. A single global request count is rarely enough.</p>\n<h2>Conclusion</h2>\n<p>Fixed windows are simple, sliding logs are accurate, sliding counters are efficient, token buckets balance burst and sustained traffic, leaky buckets smooth downstream load, and concurrency limits protect scarce workers. A good API usually combines these controls, exposes clear <code>429</code> responses, and gives clients enough guidance to slow down safely.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "api",
        "security",
        "reliability",
        "architecture"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/reading-code-you-did-not-write",
      "url": "https://soulstack.co.uk/blog/reading-code-you-did-not-write",
      "title": "Reading code you did not write",
      "summary": "Reading unfamiliar code is a skill. The goal is not to understand every line on the first pass. The goal is to build a reliable map of behaviour, boundaries, and risk, so that whe…",
      "content_html": "<p>Reading unfamiliar code is a skill. The goal is not to understand every line on the first pass. The goal is to build a reliable map of behaviour, boundaries, and risk, so that when you finally change something you change the right thing.</p>\n<h2>Start with the purpose</h2>\n<p>Before opening files at random, identify why the code exists. Read the README, package metadata, command names, tests, issue links, and recent pull request descriptions. These artefacts often explain intent better than the implementation does.</p>\n<p>Look for the public surface first. For a library, find exported functions and documented examples. For an application, find routes, jobs, command handlers, or event consumers. For a build tool, find inputs, outputs, and the command that runs it.</p>\n<p>Purpose gives you a filter. Without it, every helper looks equally important.</p>\n<h2>Find the entry points</h2>\n<p>Entry points tell you where execution begins. Common examples include:</p>\n<ul>\n<li>HTTP route registration.</li>\n<li>Command line command definitions.</li>\n<li>Worker or queue consumers.</li>\n<li>Scheduled jobs.</li>\n<li>Package exports.</li>\n<li>Test setup.</li>\n<li>Build scripts.</li>\n</ul>\n<p>Once you find an entry point, follow one realistic path from input to output. Do not branch into every helper immediately. Mark unknowns and continue until you reach the result.</p>\n<h2>Use navigation tools, not just text search</h2>\n<p>Text search is useful, but code navigation gives stronger signals. Use go to definition to find the implementation behind a symbol. Use find references to see who calls it. Use call hierarchy to drill into callers of callers where the language server or IDE supports it.</p>\n<p>In an editor, peek definition and peek references can keep context visible while you inspect a path, so you do not lose your place. On hosted repositories, symbol navigation and code search can help you move between functions and classes before you have a local environment.</p>\n<p>Treat generated code, vendored code, and framework code differently from project code. They may be necessary, but they are rarely where you should start.</p>\n<h2>Read tests as executable documentation</h2>\n<p>Tests show expected behaviour, edge cases, and naming used by maintainers. Start with tests near the feature you are reading. Then look at fixtures, mocks, and setup code.</p>\n<p>A good test tells you what matters. A brittle test tells you what the code is coupled to. Both are useful.</p>\n<p>When there are no tests, look for examples, issue reports, or command output. If none exist, create a small local reproduction as you learn.</p>\n<h2>Build a vocabulary list</h2>\n<p>Unfamiliar code often contains domain terms that are not obvious from general programming knowledge. Write them down.</p>\n<p>For each term, capture:</p>\n<ul>\n<li>Where it is created.</li>\n<li>Where it is persisted.</li>\n<li>Whether it is user input, internal state, or derived data.</li>\n<li>Whether it is stable API language or local implementation language.</li>\n</ul>\n<p>This is especially useful when two terms look similar. An account, a user, a member, and a principal may not mean the same thing.</p>\n<h2>Trace data shape changes</h2>\n<p>Many bugs and misunderstandings happen at boundaries where data changes shape. Follow how data is parsed, validated, enriched, transformed, serialised, and stored.</p>\n<p>Pay attention to optional fields, defaults, null handling, date handling, identifiers, and error values. These details often explain why code that looks redundant is actually preserving compatibility.</p>\n<p>A diagram can help, but keep it small. A five-node sketch of input, validation, storage, output, and side effects is often enough.</p>\n<h2>Check error handling</h2>\n<p>Error handling reveals design intent. Look for which errors are retried, swallowed, wrapped, logged, returned to callers, or shown to users.</p>\n<p>Ask:</p>\n<ul>\n<li>Which errors are expected?</li>\n<li>Which errors are treated as impossible?</li>\n<li>Which errors cross process or network boundaries?</li>\n<li>Which errors include enough context for debugging?</li>\n<li>Which errors may expose private data?</li>\n</ul>\n<p>Do not assume that a catch block is correct just because it is deliberate. Error handling is often where old assumptions remain after the main path changes.</p>\n<h2>Understand side effects</h2>\n<p>Side effects are where a safe-looking read can become risky. Identify writes, network calls, cache updates, queue publishes, emails, file changes, metrics, and logs.</p>\n<p>For each side effect, ask when it happens, whether it can happen twice, and whether it is transactional with the main operation. Idempotency matters in retries, background jobs, and event-driven systems.</p>\n<h2>Read history carefully</h2>\n<p>Version control history can explain why code is shaped in a surprising way. Use blame and pull request links to find context, but do not treat old rationale as automatically current.</p>\n<p>A comment from three years ago may explain a workaround that is still required. It may also describe a dependency version that no longer exists. Verify before preserving or removing it.</p>\n<p>Permanent links to specific lines are useful when discussing code, because they point to a fixed commit rather than a moving branch.</p>\n<h2>Make a small change only after mapping the risk</h2>\n<p>Before changing unfamiliar code, state what you believe the affected surface is. Include direct callers, tests, configuration, public API, storage format, and operational side effects.</p>\n<p>Then make the smallest change that proves the path. Run the narrow test first, then the broader relevant suite. If there is no test, add one where possible.</p>\n<h2>Leave the map better than you found it</h2>\n<p>When you learn something non-obvious, preserve it. That may be a renamed variable, a clearer test name, a README note, a comment near a compatibility workaround, or a small architecture note.</p>\n<p>Do not document obvious syntax. Document decisions, boundaries, and traps.</p>\n<h2>Conclusion</h2>\n<p>Reading unfamiliar code is controlled exploration. Start with purpose, find entry points, follow one path, use navigation tools, read tests, trace data, and map side effects. You do not need to understand everything before making progress. You need to understand enough to change the right thing safely.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "git",
        "cli",
        "architecture"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/rebase-vs-merge-when-to-use-each",
      "url": "https://soulstack.co.uk/blog/rebase-vs-merge-when-to-use-each",
      "title": "Rebase vs merge: when to use each",
      "summary": "Git has two common ways to integrate work from one branch into another: merge and rebase. Both are useful, but they change history in different ways. The right choice depends on w…",
      "content_html": "<p>Git has two common ways to integrate work from one branch into another: merge and rebase. Both are useful, but they change history in different ways. The right choice depends on whether you need to preserve branch history, keep a linear history, or avoid rewriting commits that other people may already use.</p>\n<h2>What merge does</h2>\n<p><code>git merge</code> incorporates changes from another branch into the current branch. When the branches have diverged, Git combines the work and records the result, which is usually a merge commit.</p>\n<p>A typical merge looks like this:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> switch</span><span style=\"color:#A5D6FF\"> main</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> merge</span><span style=\"color:#A5D6FF\"> feature/login-timeout</span></span></code></pre></div><p>Use merge when you want to preserve the fact that a branch existed and was integrated at a specific point. This is useful for shared branches, release branches, and changes where the branch topology is part of the project record.</p>\n<p>Merge does not rewrite the commits that already exist on the merged branch. That makes it safer for shared work.</p>\n<h2>What rebase does</h2>\n<p><code>git rebase</code> reapplies commits from the current branch onto a different base. The replayed commits are new commits with new identities, because their parent history changes.</p>\n<p>A typical rebase looks like this:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> switch</span><span style=\"color:#A5D6FF\"> feature/login-timeout</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> fetch</span><span style=\"color:#A5D6FF\"> origin</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> rebase</span><span style=\"color:#A5D6FF\"> origin/main</span></span></code></pre></div><p>Use rebase when you want to update a private topic branch so it appears to start from the current default branch. This can make the final history easier to read, because it avoids unnecessary merge commits from repeatedly updating the branch.</p>\n<p>Rebase is also useful in interactive mode when cleaning local commits before review:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> rebase</span><span style=\"color:#79C0FF\"> -i</span><span style=\"color:#A5D6FF\"> HEAD~4</span></span></code></pre></div><p>Interactive rebase can reorder, edit, combine, or drop commits before they are shared.</p>\n<h2>The main difference is history rewriting</h2>\n<p>Merge adds integration history. Rebase rewrites the commits being replayed.</p>\n<p>That difference drives the safety rule: do not rebase commits that other people may already have based work on, unless the team has explicitly agreed to that workflow.</p>\n<p>When you rebase a shared branch, everyone else with the old commits must reconcile their local history with the rewritten branch. That creates avoidable confusion and can reintroduce commits that were intentionally changed or removed.</p>\n<h2>Use merge for shared branches</h2>\n<p>Use merge when:</p>\n<ul>\n<li>the branch is shared by more than one person</li>\n<li>the branch is a release branch</li>\n<li>preserving the integration point matters</li>\n<li>the team wants an explicit record of branch joins</li>\n<li>you are not confident that rewriting the branch is safe</li>\n</ul>\n<p>Merge is the conservative default for shared work, because it does not replace existing commits.</p>\n<p>For example, a release branch receiving a tested hotfix can be merged back into the default branch:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> switch</span><span style=\"color:#A5D6FF\"> main</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> merge</span><span style=\"color:#A5D6FF\"> release/2026-06</span></span></code></pre></div><p>The merge commit shows that the release line and the main line were brought back together.</p>\n<h2>Use rebase for private topic branches</h2>\n<p>Use rebase when:</p>\n<ul>\n<li>the branch is private to you</li>\n<li>you want to update the branch on top of the latest default branch</li>\n<li>you want a linear set of commits before review</li>\n<li>you want to clean up local commits before pushing</li>\n<li>your team uses a fast forward or linear history policy</li>\n</ul>\n<p>Resolve conflicts commit by commit if they occur. Continue after each resolved conflict:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> add</span><span style=\"color:#FF7B72\"> &#x3C;</span><span style=\"color:#A5D6FF\">fil</span><span style=\"color:#E6EDF3\">e</span><span style=\"color:#FF7B72\">></span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> rebase</span><span style=\"color:#79C0FF\"> --continue</span></span></code></pre></div><p>Abort the rebase if the result is not what you expect:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> rebase</span><span style=\"color:#79C0FF\"> --abort</span></span></code></pre></div><h2>Do not use rebase to hide important context</h2>\n<p>A tidy history is useful, but it should not remove context that matters. Do not squash or rewrite away commits that explain a meaningful sequence of decisions, migrations, or compatibility steps.</p>\n<p>Clean history should make the project easier to understand. It should not make a risky change look simpler than it was.</p>\n<h2>Decide the policy before there is a conflict</h2>\n<p>Teams should decide their merge policy before a contentious pull request arrives. Common policies include the following.</p>\n<h3>Squash merge for pull requests</h3>\n<p>The branch can contain any local commit shape during review. The final merge creates one commit on the default branch. This works well when pull requests are the review unit.</p>\n<h3>Rebase and fast forward</h3>\n<p>Every branch is rebased onto the target branch before merge, then fast forwarded. This keeps a linear history but requires discipline around private branches and force pushing.</p>\n<h3>Merge commits</h3>\n<p>Every completed branch is merged with a merge commit. This preserves branch topology and makes integration points explicit.</p>\n<p>None of these policies is always correct. Pick one that matches the repository, team size, release process, and need for auditability.</p>\n<h2>Be careful with force push</h2>\n<p>After rebasing a branch that was already pushed, updating the remote branch usually requires a force push. Prefer the safer form:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> push</span><span style=\"color:#79C0FF\"> --force-with-lease</span></span></code></pre></div><p>This checks that the remote branch is still in the state your local repository expects before replacing it. It is still a history rewrite, so use it only when rewriting that branch is acceptable.</p>\n<h2>Conflict handling differs in feel</h2>\n<p>Merge presents conflicts for the combined integration result. Rebase presents conflicts as Git reapplies each commit, so you may need to resolve related conflicts more than once if several commits touch the same area.</p>\n<p>That is not a reason to avoid rebase entirely, but it is a reason to keep commits focused and topic branches short lived.</p>\n<h2>Conclusion</h2>\n<p>Use merge when preserving shared history and integration points matters. Use rebase to update or clean private topic branches before they are shared. The practical rule is simple: merge shared work, rebase private work, and choose a repository policy that the whole team follows consistently.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "git",
        "cli",
        "devops"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/recovering-from-git-mistakes-reflog-reset-and-revert",
      "url": "https://soulstack.co.uk/blog/recovering-from-git-mistakes-reflog-reset-and-revert",
      "title": "Recovering from Git mistakes: reflog, reset and revert",
      "summary": "Git gives you several ways to recover from mistakes, but the right command depends on whether the change is local, shared, committed, or still only in the working tree. The safest…",
      "content_html": "<p>Git gives you several ways to recover from mistakes, but the right command depends on whether the change is local, shared, committed, or still only in the working tree. The safest recovery starts by identifying what state you need to preserve.</p>\n<h2>Stop and inspect the state</h2>\n<p>Before running a destructive command, inspect the repository:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> status</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> log</span><span style=\"color:#79C0FF\"> --oneline</span><span style=\"color:#79C0FF\"> --decorate</span><span style=\"color:#79C0FF\"> --graph</span><span style=\"color:#79C0FF\"> -n</span><span style=\"color:#79C0FF\"> 20</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> diff</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> diff</span><span style=\"color:#79C0FF\"> --staged</span></span></code></pre></div><p>These commands show the current branch, staged changes, unstaged changes, and recent commits. If there is any work you cannot lose, save it before continuing.</p>\n<p>For a quick safety copy of uncommitted work, create a patch:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> diff</span><span style=\"color:#FF7B72\"> ></span><span style=\"color:#A5D6FF\"> recovery.patch</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> diff</span><span style=\"color:#79C0FF\"> --staged</span><span style=\"color:#FF7B72\"> ></span><span style=\"color:#A5D6FF\"> recovery-staged.patch</span></span></code></pre></div><p>A patch is not a substitute for understanding the state, but it gives you an extra recovery point before using commands that move branch tips or overwrite files.</p>\n<h2>Use reflog to find where a reference used to point</h2>\n<p>The reflog records when the tips of branches and other references were updated in the local repository. It is especially useful after a reset, rebase, merge, commit amend, or branch switch that moved <code>HEAD</code>.</p>\n<p>Show the recent positions of <code>HEAD</code>:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> reflog</span></span></code></pre></div><p>You may see entries such as a rebase, reset, commit, checkout, or merge. Each entry has a selector such as <code>HEAD@{1}</code> or <code>HEAD@{2}</code>, meaning where <code>HEAD</code> used to point that many moves ago. Those selectors can be used with other Git commands.</p>\n<p>To inspect an old position before changing anything:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> show</span><span style=\"color:#A5D6FF\"> HEAD@{</span><span style=\"color:#79C0FF\">2</span><span style=\"color:#A5D6FF\">}</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> log</span><span style=\"color:#79C0FF\"> --oneline</span><span style=\"color:#79C0FF\"> --decorate</span><span style=\"color:#79C0FF\"> -n</span><span style=\"color:#79C0FF\"> 5</span><span style=\"color:#A5D6FF\"> HEAD@{</span><span style=\"color:#79C0FF\">2</span><span style=\"color:#A5D6FF\">}</span></span></code></pre></div><p>To create a recovery branch at that point:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> switch</span><span style=\"color:#79C0FF\"> -c</span><span style=\"color:#A5D6FF\"> recovery/old-head</span><span style=\"color:#A5D6FF\"> HEAD@{</span><span style=\"color:#79C0FF\">2</span><span style=\"color:#A5D6FF\">}</span></span></code></pre></div><p>Creating a branch is often safer than immediately resetting back. It preserves the candidate recovery point while you compare it with the current branch.</p>\n<h2>Use reset for local history changes</h2>\n<p><code>git reset</code> moves the current branch tip to another commit. Depending on the mode, it may also update the index and working tree.</p>\n<p>Use reset when the commits are local and have not been shared, or when everyone sharing the branch has agreed to rewrite it.</p>\n<h3>Undo the last commit but keep the changes staged</h3>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> reset</span><span style=\"color:#79C0FF\"> --soft</span><span style=\"color:#A5D6FF\"> HEAD~1</span></span></code></pre></div><p>This moves the branch back by one commit and leaves the index and working tree unchanged, so the changes stay staged. Use it when the commit was made too early and you want to adjust the message or add more staged changes.</p>\n<h3>Undo the last commit and unstage the changes</h3>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> reset</span><span style=\"color:#79C0FF\"> --mixed</span><span style=\"color:#A5D6FF\"> HEAD~1</span></span></code></pre></div><p>This is the default reset mode. It moves the branch back by one commit and updates the index to match, so nothing stays staged, while the file changes remain in the working tree. Use it when you want to split a commit or restage only part of the work.</p>\n<h3>Discard local changes and commits</h3>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> reset</span><span style=\"color:#79C0FF\"> --hard</span><span style=\"color:#A5D6FF\"> HEAD~1</span></span></code></pre></div><p>This moves the branch and overwrites the index and working tree to match the target commit. It discards uncommitted changes and can remove untracked files in the affected paths. Use it only when you are certain the changes are not needed or you have saved them elsewhere.</p>\n<h2>Use revert for shared history</h2>\n<p><code>git revert</code> records a new commit that reverses the effect of an earlier commit. It does not remove the original commit from history.</p>\n<p>Use revert when the bad commit has already been pushed to a shared branch, especially the default branch. It keeps history consistent for everyone who has already fetched or based work on that commit.</p>\n<p>To revert one commit:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> revert</span><span style=\"color:#FF7B72\"> &#x3C;</span><span style=\"color:#A5D6FF\">commi</span><span style=\"color:#E6EDF3\">t</span><span style=\"color:#FF7B72\">></span></span></code></pre></div><p>To revert several commits without committing each one immediately:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> revert</span><span style=\"color:#79C0FF\"> --no-commit</span><span style=\"color:#FF7B72\"> &#x3C;</span><span style=\"color:#A5D6FF\">oldest-commi</span><span style=\"color:#E6EDF3\">t</span><span style=\"color:#FF7B72\">></span><span style=\"color:#A5D6FF\">^..</span><span style=\"color:#FF7B72\">&#x3C;</span><span style=\"color:#A5D6FF\">newest-commi</span><span style=\"color:#E6EDF3\">t</span><span style=\"color:#FF7B72\">></span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> commit</span></span></code></pre></div><p>The <code>--no-commit</code> flag applies the reverse changes to the working tree and index without making a commit. Review the resulting diff before committing. A revert is still a code change and can conflict with later work.</p>\n<h2>Recover from an accidental hard reset</h2>\n<p>If you ran <code>git reset --hard</code> and moved away from commits you still need, use the reflog.</p>\n<p>First, find the old position:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> reflog</span></span></code></pre></div><p>Then create a recovery branch:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> switch</span><span style=\"color:#79C0FF\"> -c</span><span style=\"color:#A5D6FF\"> recovery/before-reset</span><span style=\"color:#A5D6FF\"> HEAD@{</span><span style=\"color:#79C0FF\">1</span><span style=\"color:#A5D6FF\">}</span></span></code></pre></div><p>Inspect the branch. If it is correct, either keep it as the restored work branch or move the original branch back intentionally:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> switch</span><span style=\"color:#A5D6FF\"> main</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> reset</span><span style=\"color:#79C0FF\"> --hard</span><span style=\"color:#A5D6FF\"> recovery/before-reset</span></span></code></pre></div><p>Only do the final hard reset if that branch is local or rewriting it is safe for the team.</p>\n<h2>Recover from a bad rebase</h2>\n<p>If a rebase produced the wrong result, stop it while it is still running:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> rebase</span><span style=\"color:#79C0FF\"> --abort</span></span></code></pre></div><p>If the rebase has already completed, use the reflog to find the pre-rebase position:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> reflog</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> switch</span><span style=\"color:#79C0FF\"> -c</span><span style=\"color:#A5D6FF\"> recovery/before-rebase</span><span style=\"color:#A5D6FF\"> HEAD@{</span><span style=\"color:#79C0FF\">3</span><span style=\"color:#A5D6FF\">}</span></span></code></pre></div><p>Compare the recovered branch with the current branch before deciding what to keep.</p>\n<h2>Recover a deleted branch</h2>\n<p>A deleted branch name can often be recreated if the commit it pointed to is still available in the reflog or another reference.</p>\n<p>Search the reflog for the branch tip or the work you recognise:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> reflog</span><span style=\"color:#79C0FF\"> --all</span></span></code></pre></div><p>Then recreate the branch:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> switch</span><span style=\"color:#79C0FF\"> -c</span><span style=\"color:#A5D6FF\"> feature/restored</span><span style=\"color:#FF7B72\"> &#x3C;</span><span style=\"color:#A5D6FF\">commi</span><span style=\"color:#E6EDF3\">t</span><span style=\"color:#FF7B72\">></span></span></code></pre></div><p>Reflog retention is not permanent. Reachable entries expire after 90 days by default and unreachable entries after 30 days, controlled by <code>gc.reflogExpire</code> and <code>gc.reflogExpireUnreachable</code>. Do not treat the reflog as a backup.</p>\n<h2>Choose the command by the risk</h2>\n<p>Use this rule of thumb:</p>\n<ul>\n<li>use <code>restore</code> or checkout style operations for individual files</li>\n<li>use <code>reset</code> for local commits that can be rewritten</li>\n<li>use <code>reflog</code> to find previous reference positions</li>\n<li>use <code>revert</code> for changes already shared with others</li>\n<li>create a recovery branch before making another destructive move</li>\n</ul>\n<p>The dangerous part is not the command name. The dangerous part is changing history that other people already have, or overwriting working tree changes that were never saved.</p>\n<h2>Conclusion</h2>\n<p>Git recovery is manageable when you slow down and inspect the state first. Reflog helps you find where <code>HEAD</code> and branches used to point. Reset is for local history you can safely rewrite. Revert is for shared history that must remain consistent. When in doubt, create a recovery branch before making the next change.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "git",
        "cli",
        "reliability"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/setting-up-mfa-and-why-it-matters",
      "url": "https://soulstack.co.uk/blog/setting-up-mfa-and-why-it-matters",
      "title": "Setting up MFA and why it matters",
      "summary": "Multi-factor authentication, usually called MFA, requires more than one type of evidence before a user can sign in. It matters because passwords are often guessed, reused, phished…",
      "content_html": "<p>Multi-factor authentication, usually called MFA, requires more than one type of evidence before a user can sign in. It matters because passwords are often guessed, reused, phished or stolen from other services, and a second factor limits the damage when that happens.</p>\n<h2>What MFA adds</h2>\n<p>A password is something the user knows. MFA adds another factor, such as something the user has or something the user is.</p>\n<p>Common factors include authenticator apps, security keys, platform passkeys, smart cards and biometrics. Some systems also use SMS, email codes or phone calls, but these are weaker than phishing resistant methods.</p>\n<p>MFA does not make accounts impossible to compromise. It raises the cost for attackers and reduces the damage from stolen passwords.</p>\n<h2>Prefer phishing resistant MFA</h2>\n<p>Phishing resistant MFA is designed so a fake site cannot capture a reusable code and replay it to the real site.</p>\n<p>Passkeys, FIDO2 security keys and other WebAuthn based authenticators are strong choices because authentication is bound to the legitimate origin. The credential is scoped to the relying party derived from the web origin, so the authenticator will not complete the same sign in for a lookalike phishing site.</p>\n<p>Where phishing resistant MFA is available, use it for administrators, developers, finance users and anyone with access to sensitive data.</p>\n<h2>Use authenticator apps when stronger options are not available</h2>\n<p>Time based one time password apps are usually better than relying on a password alone. They work offline and avoid some weaknesses of SMS.</p>\n<p>They are still phishable because a user can type a current code into a fake login page. They also need careful recovery planning because losing the device can lock out the account.</p>\n<p>Use them when passkeys or security keys are not available, but do not treat them as the strongest option.</p>\n<h2>Treat SMS and email codes as fallback methods</h2>\n<p>SMS and email codes are common because they are easy to deploy, but they have weaknesses. Phones can be lost, numbers can be transferred through SIM swaps or porting, messages can be intercepted, and email accounts are often already the recovery path for many services.</p>\n<p>Use SMS or email codes only when better factors are unavailable or as a carefully controlled recovery option.</p>\n<p>For high risk accounts, avoid letting a weaker fallback bypass a stronger primary factor.</p>\n<h2>Set MFA up safely</h2>\n<p>Start with the most important accounts: email, password manager, source control, cloud provider, domain registrar, payment systems, identity provider and production administration.</p>\n<p>Register at least two strong authenticators where the service allows it. For example, use a platform passkey plus a hardware security key, or two hardware security keys stored separately.</p>\n<p>Save recovery codes in a password manager or another protected location. Do not store them in the same inbox or device that they are meant to recover.</p>\n<p>Remove old devices and factors when they are replaced. Review MFA methods after role changes, device loss or suspected compromise.</p>\n<h2>Protect account recovery</h2>\n<p>MFA is only as strong as the recovery process. If support can disable MFA after weak checks, attackers will target recovery instead of the login screen.</p>\n<p>Use recovery methods that match the risk of the account. Administrative accounts need stronger recovery than low risk consumer accounts.</p>\n<p>Log and alert on MFA resets, new factor enrolment, recovery code use and factor removal.</p>\n<h2>Make MFA usable</h2>\n<p>Security controls fail when users cannot use them reliably. Choose methods that fit the users, devices and environment.</p>\n<p>Support backup factors. Provide clear enrolment steps. Test recovery before an incident. Avoid excessive prompts that train users to approve without thinking.</p>\n<p>For workforce systems, combine MFA with single sign-on and device management where appropriate. That reduces repeated prompts while keeping access controlled.</p>\n<h2>Conclusion</h2>\n<p>MFA reduces the risk from stolen passwords, but method choice matters. Prefer phishing resistant passkeys or security keys, use authenticator apps when stronger options are unavailable, keep weak fallbacks under control, and protect recovery with the same care as sign in.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "security",
        "web"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/soft-deletes-audit-trails-and-history-tables",
      "url": "https://soulstack.co.uk/blog/soft-deletes-audit-trails-and-history-tables",
      "title": "Soft deletes, audit trails and history tables",
      "summary": "Deleting data is rarely one decision. A system may need to hide a record from normal use, prove who changed it, restore it later, retain it for a legal period or remove it permane…",
      "content_html": "<p>Deleting data is rarely one decision. A system may need to hide a record from normal use, prove who changed it, restore it later, retain it for a legal period or remove it permanently. Soft deletes, audit trails and history tables solve different parts of that problem. Treating them as the same feature creates confusing data and weak compliance controls.</p>\n<h2>Soft deletes hide current records</h2>\n<p>A soft delete marks a row as deleted instead of removing it. The common shape is a nullable timestamp such as <code>deleted_at</code>, sometimes with <code>deleted_by</code> and <code>delete_reason</code>.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FF7B72\">ALTER</span><span style=\"color:#FF7B72\"> TABLE</span><span style=\"color:#E6EDF3\"> users</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">ADD</span><span style=\"color:#E6EDF3\"> COLUMN deleted_at </span><span style=\"color:#FF7B72\">timestamp</span><span style=\"color:#FF7B72\"> NULL</span><span style=\"color:#E6EDF3\">;</span></span></code></pre></div><p>Normal application queries then filter out deleted rows.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FF7B72\">SELECT</span><span style=\"color:#FF7B72\"> *</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">FROM</span><span style=\"color:#E6EDF3\"> users</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">WHERE</span><span style=\"color:#E6EDF3\"> deleted_at </span><span style=\"color:#FF7B72\">IS</span><span style=\"color:#FF7B72\"> NULL</span><span style=\"color:#E6EDF3\">;</span></span></code></pre></div><p>Soft deletes are useful when users need undo, when related records still refer to the row or when deletion should be reviewed before purge. They are not a complete audit trail because they usually store only the latest deletion state, not every change that happened before it.</p>\n<h2>Soft deletes have costs</h2>\n<p>Every query must handle the deletion predicate correctly. Missing <code>deleted_at IS NULL</code> can expose hidden data. Adding it everywhere also makes indexes more important, because active rows are now a subset of the table.</p>\n<p>A partial index can help in engines that support it. PostgreSQL, for example, supports partial indexes with an arbitrary <code>WHERE</code> predicate so long as it only references columns of the table being indexed.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FF7B72\">CREATE</span><span style=\"color:#FF7B72\"> INDEX</span><span style=\"color:#D2A8FF\"> users_active_email_idx</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">ON</span><span style=\"color:#E6EDF3\"> users (email)</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">WHERE</span><span style=\"color:#E6EDF3\"> deleted_at </span><span style=\"color:#FF7B72\">IS</span><span style=\"color:#FF7B72\"> NULL</span><span style=\"color:#E6EDF3\">;</span></span></code></pre></div><p>Uniqueness needs deliberate design. If a deleted row keeps its email address, should a new active row be allowed to reuse it? The answer affects unique indexes, restoration behaviour and user support tools.</p>\n<p>Soft-deleted data also still exists. It may still be included in backups, analytics exports, search indexes and replicas. Do not present soft delete as permanent erasure.</p>\n<h2>Audit trails record events</h2>\n<p>An audit trail records who did what, when, and often why. It is event-shaped rather than state-shaped. A useful audit event includes the actor, action, target, timestamp, request or correlation id, and enough before and after data to explain the change.</p>\n<p>Audit trails should be append-only from the application point of view. The system should not update old audit events during normal business flow.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FF7B72\">CREATE</span><span style=\"color:#FF7B72\"> TABLE</span><span style=\"color:#D2A8FF\"> audit_events</span><span style=\"color:#E6EDF3\"> (</span></span>\n<span class=\"line\"><span style=\"color:#E6EDF3\">  id </span><span style=\"color:#FF7B72\">bigint</span><span style=\"color:#FF7B72\"> generated</span><span style=\"color:#FF7B72\"> always</span><span style=\"color:#FF7B72\"> as</span><span style=\"color:#FF7B72\"> identity</span><span style=\"color:#FF7B72\"> primary key</span><span style=\"color:#E6EDF3\">,</span></span>\n<span class=\"line\"><span style=\"color:#E6EDF3\">  actor_id </span><span style=\"color:#FF7B72\">bigint</span><span style=\"color:#FF7B72\"> NULL</span><span style=\"color:#E6EDF3\">,</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">  action</span><span style=\"color:#FF7B72\"> text</span><span style=\"color:#FF7B72\"> NOT NULL</span><span style=\"color:#E6EDF3\">,</span></span>\n<span class=\"line\"><span style=\"color:#E6EDF3\">  target_table </span><span style=\"color:#FF7B72\">text</span><span style=\"color:#FF7B72\"> NOT NULL</span><span style=\"color:#E6EDF3\">,</span></span>\n<span class=\"line\"><span style=\"color:#E6EDF3\">  target_id </span><span style=\"color:#FF7B72\">text</span><span style=\"color:#FF7B72\"> NOT NULL</span><span style=\"color:#E6EDF3\">,</span></span>\n<span class=\"line\"><span style=\"color:#E6EDF3\">  occurred_at </span><span style=\"color:#FF7B72\">timestamp</span><span style=\"color:#FF7B72\"> NOT NULL</span><span style=\"color:#E6EDF3\">,</span></span>\n<span class=\"line\"><span style=\"color:#E6EDF3\">  before_data jsonb </span><span style=\"color:#FF7B72\">NULL</span><span style=\"color:#E6EDF3\">,</span></span>\n<span class=\"line\"><span style=\"color:#E6EDF3\">  after_data jsonb </span><span style=\"color:#FF7B72\">NULL</span></span>\n<span class=\"line\"><span style=\"color:#E6EDF3\">);</span></span></code></pre></div><p>Do not rely only on application logs for audit. Logs are often sampled, rotated, redacted or stored for operations rather than evidence. A database audit table or dedicated audit system makes retention and querying explicit.</p>\n<h2>History tables record versions</h2>\n<p>A history table stores previous versions of a row. It answers questions such as what this record looked like last Tuesday, or which values changed over time.</p>\n<p>Some databases provide system-versioned temporal tables. In SQL Server, a system-versioned temporal table keeps current rows in one table while the Database Engine automatically moves each previous row version into a separate history table on update or delete. These are useful for point-in-time analysis, but they still require retention planning because history can grow quickly.</p>\n<p>Manual history tables can work well when the required versioning semantics are simple. They should include validity columns, a stable entity id, the changed data and metadata about the change.</p>\n<h2>Choose the right mechanism</h2>\n<p>Use soft deletes when the main requirement is to remove a row from normal application behaviour without immediately purging it.</p>\n<p>Use audit trails when the main requirement is accountability. The audit record should explain the action, actor and context even if the current row later changes again.</p>\n<p>Use history tables when the main requirement is point-in-time reconstruction or comparison between versions.</p>\n<p>Many systems need more than one mechanism. For example, a user deletion can set <code>deleted_at</code>, write an audit event and preserve row versions for a defined retention period.</p>\n<h2>Plan retention and purge</h2>\n<p>Retention is part of the design, not a cleanup task to add later. Decide how long soft-deleted rows, audit events and history versions must be kept. Decide who can purge them and how purge is verified.</p>\n<p>A purge job should be explicit, batched and observable. It should not accidentally remove records that are still needed for support, billing, legal hold or referential integrity.</p>\n<p>For privacy-sensitive data, consider whether fields should be redacted while the audit event remains. Sometimes the system needs to keep proof that an action occurred without keeping the full personal data payload forever.</p>\n<h2>Protect integrity</h2>\n<p>Soft deletes can break assumptions in foreign keys and unique constraints. Decide whether child records are also soft-deleted, prevented from referencing deleted parents or allowed to keep historical references.</p>\n<p>Audit and history tables should be protected from casual edits. Restrict write permissions, monitor changes and include enough metadata to trace the source of each event.</p>\n<h2>Conclusion</h2>\n<p>Soft deletes, audit trails and history tables are related but not interchangeable. Soft deletes control visibility, audit trails prove actions, and history tables preserve versions. A robust design names the requirement first, then chooses the mechanism, indexes the active path and defines retention before the table grows without limit.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "databases",
        "sql",
        "architecture",
        "security"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/ssh-config-aliases-keys-and-jump-hosts",
      "url": "https://soulstack.co.uk/blog/ssh-config-aliases-keys-and-jump-hosts",
      "title": "SSH config: aliases, keys and jump hosts",
      "summary": "OpenSSH client configuration turns long, error-prone ssh commands into named hosts with explicit users, keys and routing. A small ~/.ssh/config file is easier to review than a she…",
      "content_html": "<p>OpenSSH client configuration turns long, error-prone <code>ssh</code> commands into named hosts with explicit users, keys and routing. A small <code>~/.ssh/config</code> file is easier to review than a shell history full of flags.</p>\n<h2>Put repeated options in host blocks</h2>\n<p>The OpenSSH client reads its per-user configuration from <code>~/.ssh/config</code>. The file is made of <code>Host</code> sections. Each section applies to host patterns that match the name used on the command line.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span>Host app-prod</span></span>\n<span class=\"line\"><span>  HostName app-01.example.com</span></span>\n<span class=\"line\"><span>  User deploy</span></span>\n<span class=\"line\"><span>  Port 22</span></span></code></pre></div><p>After this, the connection is just:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">ssh</span><span style=\"color:#A5D6FF\"> app-prod</span></span></code></pre></div><p>Use aliases for intent, not only for hostnames. <code>app-prod</code> is usually more useful than <code>app-01</code>, because it tells the reader why the connection exists.</p>\n<h2>Keep defaults separate from specific hosts</h2>\n<p>Use a wildcard <code>Host *</code> section for safe defaults. For each option, ssh uses the first value it obtains, so put the wildcard section after more specific entries when you rely on that behaviour.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span>Host app-prod</span></span>\n<span class=\"line\"><span>  HostName app-01.example.com</span></span>\n<span class=\"line\"><span>  User deploy</span></span>\n<span class=\"line\"><span>  IdentityFile ~/.ssh/id_ed25519_app_prod</span></span>\n<span class=\"line\"><span></span></span>\n<span class=\"line\"><span>Host *</span></span>\n<span class=\"line\"><span>  AddKeysToAgent no</span></span>\n<span class=\"line\"><span>  IdentitiesOnly yes</span></span>\n<span class=\"line\"><span>  ServerAliveInterval 30</span></span></code></pre></div><p>Keep defaults boring. A global option affects every connection, including Git remotes and one-off diagnostics.</p>\n<h2>Use explicit keys</h2>\n<p><code>IdentityFile</code> specifies a file from which the client reads an authentication identity. Use one key per trust boundary when that makes access review easier.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span>Host git-internal</span></span>\n<span class=\"line\"><span>  HostName git.example.com</span></span>\n<span class=\"line\"><span>  User git</span></span>\n<span class=\"line\"><span>  IdentityFile ~/.ssh/id_ed25519_git_internal</span></span>\n<span class=\"line\"><span>  IdentitiesOnly yes</span></span></code></pre></div><p><code>IdentitiesOnly yes</code> tells the client to use only the configured identity files, even when an agent offers more identities. This avoids surprising key attempts, and it can prevent authentication failures on servers that limit the number of tries.</p>\n<p>Set strict file permissions for private keys and the config file.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">chmod</span><span style=\"color:#79C0FF\"> 700</span><span style=\"color:#A5D6FF\"> ~/.ssh</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">chmod</span><span style=\"color:#79C0FF\"> 600</span><span style=\"color:#A5D6FF\"> ~/.ssh/config</span><span style=\"color:#A5D6FF\"> ~/.ssh/id_ed25519_git_internal</span></span></code></pre></div><p>Do not commit private keys. Do not paste private keys into issue trackers, chat or deployment logs.</p>\n<h2>Use jump hosts with ProxyJump</h2>\n<p>A jump host is an intermediate SSH server used to reach another host. OpenSSH supports this with <code>ProxyJump</code>, which is the config-file form of the <code>-J</code> command line option. The client first connects to the jump host, then establishes a forwarded connection to the final target.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span>Host bastion</span></span>\n<span class=\"line\"><span>  HostName bastion.example.com</span></span>\n<span class=\"line\"><span>  User lee</span></span>\n<span class=\"line\"><span>  IdentityFile ~/.ssh/id_ed25519_bastion</span></span>\n<span class=\"line\"><span></span></span>\n<span class=\"line\"><span>Host app-private</span></span>\n<span class=\"line\"><span>  HostName 10.0.12.34</span></span>\n<span class=\"line\"><span>  User deploy</span></span>\n<span class=\"line\"><span>  IdentityFile ~/.ssh/id_ed25519_app</span></span>\n<span class=\"line\"><span>  ProxyJump bastion</span></span></code></pre></div><p>Now the private host is reached with:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">ssh</span><span style=\"color:#A5D6FF\"> app-private</span></span></code></pre></div><p>The configuration for the target connection stays on the originating machine. Do not assume the jump host&#39;s local SSH config will be used for the final target.</p>\n<h2>Avoid agent forwarding by default</h2>\n<p>Agent forwarding can be useful, but it exposes the local agent to the remote environment. <code>ForwardAgent</code> is off by default; keep it that way unless a specific workflow requires it and the remote host is trusted.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span>Host *</span></span>\n<span class=\"line\"><span>  ForwardAgent no</span></span></code></pre></div><p>When a workflow needs access from a remote host to another system, prefer alternatives such as <code>ProxyJump</code>, a limited key deployed for that environment, or short-lived credentials.</p>\n<h2>Manage host key checking deliberately</h2>\n<p>Host keys protect against connecting to the wrong server. Avoid disabling host key checking globally.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span>Host *</span></span>\n<span class=\"line\"><span>  StrictHostKeyChecking ask</span></span>\n<span class=\"line\"><span>  UserKnownHostsFile ~/.ssh/known_hosts</span></span></code></pre></div><p>If infrastructure rebuilds hosts often, solve host key rotation in provisioning or known-host management. Do not normalise <code>StrictHostKeyChecking no</code> across all hosts.</p>\n<h2>Debug the effective configuration</h2>\n<p>Use <code>ssh -G</code> to print the configuration that OpenSSH would use after evaluating <code>Host</code> and <code>Match</code> blocks.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">ssh</span><span style=\"color:#79C0FF\"> -G</span><span style=\"color:#A5D6FF\"> app-private</span><span style=\"color:#FF7B72\"> |</span><span style=\"color:#FFA657\"> grep</span><span style=\"color:#79C0FF\"> -E</span><span style=\"color:#A5D6FF\"> '^(hostname|user|identityfile|proxyjump) '</span></span></code></pre></div><p>Use verbose mode for connection diagnostics. Repeat <code>-v</code> up to three times for more detail.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">ssh</span><span style=\"color:#79C0FF\"> -vvv</span><span style=\"color:#A5D6FF\"> app-private</span></span></code></pre></div><p>Verbose output can include paths, usernames and key information. Treat logs from failed SSH sessions as sensitive when sharing them.</p>\n<h2>Keep the file maintainable</h2>\n<p>Group hosts by environment or purpose. Add comments only for facts that are not obvious from the options. Remove aliases when systems are decommissioned.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span># Production application hosts are reached through the bastion.</span></span>\n<span class=\"line\"><span>Host prod-*</span></span>\n<span class=\"line\"><span>  User deploy</span></span>\n<span class=\"line\"><span>  ProxyJump bastion</span></span></code></pre></div><p>Use pattern blocks carefully. A broad pattern such as <code>Host *prod*</code> may match more than intended.</p>\n<h2>Conclusion</h2>\n<p>A good SSH config makes secure connections boring. Use named aliases, explicit users, explicit identity files and <code>ProxyJump</code> for private networks. Keep global defaults conservative, leave agent forwarding off by default and use <code>ssh -G</code> when the effective configuration is unclear.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "cli",
        "security",
        "devops"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/storing-passwords-safely",
      "url": "https://soulstack.co.uk/blog/storing-passwords-safely",
      "title": "Storing passwords safely",
      "summary": "Password storage has one goal: keep user passwords protected even if the application database is copied by an attacker. That means never storing plaintext passwords, never using r…",
      "content_html": "<p>Password storage has one goal: keep user passwords protected even if the application database is copied by an attacker. That means never storing plaintext passwords, never using reversible encryption for normal login passwords, and never relying on fast hashes.</p>\n<h2>Store password verifiers, not passwords</h2>\n<p>A login system does not need to know the user&#39;s password after registration. It only needs to verify that a later attempt matches the original secret.</p>\n<p>The safe pattern is to run the password through a password hashing algorithm and store the resulting verifier. On login, run the submitted password through the same algorithm and compare the result.</p>\n<p>The stored value should include the algorithm and parameters needed for verification. That makes future migration possible without forcing every user to reset their password at once.</p>\n<h2>Use a password hashing algorithm</h2>\n<p>Use a dedicated password hashing algorithm, not a general purpose hash.</p>\n<p>Argon2id is a strong default when it is available. OWASP also documents scrypt, bcrypt and PBKDF2 as options when they fit the platform or compliance requirement. Bcrypt is best reserved for legacy systems, and it silently truncates input beyond 72 bytes, so enforce that limit if you use it.</p>\n<p>The important property is cost. Password hashing should be deliberately expensive enough to slow large scale guessing, while still acceptable for normal login traffic. General purpose hashes such as SHA-256 are too fast for password storage.</p>\n<h2>Use a unique salt for every password</h2>\n<p>Each password must have a unique salt. The salt prevents attackers from using one precomputed table against every account and stops equal passwords from producing equal stored values.</p>\n<p>The salt is not secret. It should be generated with a cryptographically secure random source and stored with the password verifier. Most modern password hashing libraries generate and embed the salt for you.</p>\n<p>Do not use a single application wide salt. That gives weaker protection than per password salts and creates a shared value that can become a migration problem.</p>\n<h2>Consider a pepper only as defence in depth</h2>\n<p>A pepper is an additional secret value used alongside the password hashing process. Unlike a salt, a pepper must be kept secret and stored away from the password database, ideally in a secrets vault or hardware security module.</p>\n<p>A pepper can reduce damage if only the database is stolen. It does not replace strong password hashing, unique salts or secure implementation. If the pepper is exposed, plan to rotate it and reset affected password verifiers.</p>\n<h2>Validate without leaking information</h2>\n<p>The registration and reset flows should help users choose strong passwords without exposing account state or sensitive implementation details.</p>\n<p>Use generous minimum length requirements that support passphrases, and allow long inputs of at least 64 characters. Do not impose outdated complexity rules that push users towards predictable substitutions. Screen new passwords against known compromised password lists where practical.</p>\n<p>Avoid login and reset responses that reveal whether an account exists. Rate limit authentication attempts and protect password reset flows with short lived, single use tokens.</p>\n<h2>Migrate old hashes safely</h2>\n<p>Old password storage schemes should be upgraded. The usual approach is lazy migration: when a user logs in successfully, verify the old hash, then rehash the submitted password with the current algorithm and parameters.</p>\n<p>Use an explicit version marker for each stored verifier. That avoids guessing how a value was produced and lets the application know which accounts still need migration.</p>\n<p>Force resets only when needed, such as after a confirmed leak, an unsafe legacy format that cannot be verified safely, or a compromised pepper.</p>\n<h2>Operational controls matter</h2>\n<p>Password storage is not only an algorithm choice. Restrict database access, protect backups, monitor unusual authentication activity and keep authentication libraries patched.</p>\n<p>Do not log plaintext passwords, password reset tokens or password hash values. Redact authentication inputs in application logs and telemetry.</p>\n<p>Treat authentication code as high risk code. Keep it small, tested and reviewed.</p>\n<h2>Conclusion</h2>\n<p>Safe password storage uses slow, dedicated password hashing with a unique salt per password and current parameters. Encryption, fast hashing and custom schemes are the wrong tools. Design the surrounding flows so passwords, reset tokens and verifiers do not leak through logs, backups or account recovery.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "security",
        "web",
        "architecture"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/structuring-a-project-for-newcomers",
      "url": "https://soulstack.co.uk/blog/structuring-a-project-for-newcomers",
      "title": "Structuring a project for newcomers",
      "summary": "A project structure is part of the developer interface. A clear layout reduces the time between opening a repository and making a safe first change, so treat directory names and e…",
      "content_html": "<p>A project structure is part of the developer interface. A clear layout reduces the time between opening a repository and making a safe first change, so treat directory names and entry points as something readers depend on.</p>\n<h2>Optimise for orientation</h2>\n<p>A newcomer starts with questions, not context. They need to know where the application starts, where tests live, how code is grouped, where configuration belongs, and which files are generated.</p>\n<p>The top level should make those answers visible. A crowded root makes every file look equally important. Keep the root for files that define the repository as a whole: README, licence, package or build metadata, test configuration, editor configuration, and high value documentation.</p>\n<p>Move supporting detail into named directories. Names such as <code>src</code>, <code>tests</code>, <code>docs</code>, <code>scripts</code>, and <code>fixtures</code> are useful because they describe purpose. Custom names can work, but only when the README explains them.</p>\n<h2>Put the entry point where readers expect it</h2>\n<p>Every project has at least one entry point. It may be a web server bootstrap, a command line executable, a package export, a worker, a migration runner, or a static site build script. Make the main path easy to find.</p>\n<p>For a small package, a flat structure can be clearer than an elaborate one. For a larger application, group code by domain or feature when that matches how changes are made. Avoid splitting every concept by technical type if a simple feature change then requires edits across many distant directories.</p>\n<p>A good structure helps answer &quot;where would this change go?&quot; without asking a maintainer.</p>\n<h2>Use conventional files for ecosystem tooling</h2>\n<p>Each ecosystem has files that tools and developers already know how to find. Use those conventions unless there is a strong reason not to.</p>\n<p>For a Node package, <code>package.json</code> describes the package and its scripts. The <code>repository</code> field points to where the source code lives, which helps anyone who wants to contribute. The <code>scripts</code> field defines named commands that can be run through npm. For a monorepo, the top-level <code>workspaces</code> field lists local packages that are installed and linked together.</p>\n<p>For a Go module, <code>go.mod</code> declares the module path with a single <code>module</code> directive, and that path is also the import prefix for every package in the module. Small modules can keep code in the root. More complex projects often separate executable commands from reusable packages.</p>\n<p>For Python, package layout, import paths, and command entry points should follow the packaging tool in use. Avoid clever import path tricks, because they make local execution and test execution harder to reason about.</p>\n<p>The goal is not to copy a template blindly. It is to let standard tools work without special instructions.</p>\n<h2>Keep generated files out of the way</h2>\n<p>Generated files should be clearly marked by location, naming, or comments. A newcomer should not waste time editing output that will be overwritten.</p>\n<p>If generated files are committed, explain why. Common reasons include generated clients, checked-in lock files, or static assets required by a deployment target. Include the command that regenerates them and the expected review approach.</p>\n<p>If generated files are not committed, make sure setup and build commands create them reliably.</p>\n<h2>Name directories by responsibility</h2>\n<p>Directory names should describe stable responsibilities, not temporary implementation detail. <code>auth</code>, <code>billing</code>, <code>search</code>, and <code>content</code> can be good feature names. <code>misc</code>, <code>common</code>, <code>new</code>, and <code>temp</code> are warning signs.</p>\n<p>A <code>utils</code> directory often becomes a dumping ground. Prefer specific names such as <code>date-formatting</code>, <code>http-client</code>, or <code>validation</code> when the code has a clear responsibility. If a helper has no clear owner, that may be a design problem rather than a naming problem.</p>\n<h2>Make tests easy to connect to code</h2>\n<p>Tests should be discoverable from the code they protect. That can mean colocated tests, a top-level <code>tests</code> directory, or a hybrid approach. The important rule is consistency.</p>\n<p>Document the test command in the README. If there are different levels of tests, name them plainly:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">npm</span><span style=\"color:#A5D6FF\"> test</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">npm</span><span style=\"color:#A5D6FF\"> run</span><span style=\"color:#A5D6FF\"> test:integration</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">npm</span><span style=\"color:#A5D6FF\"> run</span><span style=\"color:#A5D6FF\"> test:e2e</span></span></code></pre></div><p>Do not require a newcomer to know hidden CI commands before they can run a local check.</p>\n<h2>Put scripts behind stable commands</h2>\n<p>A <code>scripts</code> directory is useful, but the public interface should be stable command names. Instead of asking developers to remember a long file path, expose common tasks through the package manager, task runner, make target, or documented command.</p>\n<p>Use scripts for repeatable repository tasks: generating content, validating metadata, checking formatting, seeding local data, or cleaning build output. Keep scripts deterministic and safe to rerun where possible.</p>\n<h2>Document configuration boundaries</h2>\n<p>Configuration is often where newcomers get stuck. Separate example configuration from real secrets. Use names that make intent clear, such as <code>.env.example</code> for a committed template and a private local file for real values.</p>\n<p>Do not commit real credentials. Do not require a reader to guess which variables are mandatory. Explain each required variable, acceptable values, and where it is used.</p>\n<p>When configuration differs between local development, tests, preview, and production, document the boundary. A single unclear environment file can cause tests to depend on live services by accident.</p>\n<h2>Make ownership visible without personal names</h2>\n<p>A public repository can show ownership through process rather than informal knowledge. Use <code>CODEOWNERS</code>, contribution guidelines, issue templates, and pull request templates when the platform supports them.</p>\n<p>Ownership documentation should answer how changes are reviewed, not who happens to know the code. The structure should reduce dependence on private context.</p>\n<h2>Review structure as the project grows</h2>\n<p>A structure that is right for ten files may be wrong for five hundred. Revisit layout when changes repeatedly cross unrelated directories, tests become hard to find, or setup needs undocumented manual steps.</p>\n<p>Do not reorganise for neatness alone. Reorganise when it improves navigation, reduces coupling, or makes standard commands simpler.</p>\n<h2>Conclusion</h2>\n<p>A newcomer-friendly project is not the one with the most directories. It is the one with visible entry points, conventional metadata, clear command names, discoverable tests, and documented configuration. Structure is successful when the next change has an obvious home.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "architecture",
        "git",
        "devops"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/the-owasp-top-risks-explained-simply",
      "url": "https://soulstack.co.uk/blog/the-owasp-top-risks-explained-simply",
      "title": "The OWASP top risks, explained simply",
      "summary": "The OWASP Top 10 is an awareness document for the most common web application security risks, and the current released version is the 2025 list. It is not a checklist that proves…",
      "content_html": "<p>The OWASP Top 10 is an awareness document for the most common web application security risks, and the current released version is the 2025 list. It is not a checklist that proves an application is safe, but it is a shared starting point that helps teams talk about risk, prioritise work and decide what to test. This post walks through each category in plain language: what it means, what it looks like in practice and how to prevent it.</p>\n<h2>A01 Broken access control</h2>\n<p>Broken access control means users can do something they should not be allowed to do.</p>\n<p>Examples include reading another user&#39;s record by changing an ID in a URL, using an admin endpoint as a normal user, or modifying a request to bypass a workflow step.</p>\n<p>Prevent it by enforcing authorisation on the server for every protected action. Do not rely on hidden buttons, client side checks or route names.</p>\n<h2>A02 Security misconfiguration</h2>\n<p>Security misconfiguration means a system is deployed with unsafe settings.</p>\n<p>Examples include default credentials, verbose error messages, open cloud storage, missing security headers, unnecessary services and overly broad network access.</p>\n<p>Prevent it with hardened defaults, automated configuration checks, repeatable infrastructure and parity between environments.</p>\n<h2>A03 Software supply chain failures</h2>\n<p>Software supply chain failures happen when the application depends on compromised, vulnerable or untrusted software.</p>\n<p>Examples include vulnerable packages, malicious dependency updates, unsigned artefacts and unverified build steps.</p>\n<p>Prevent it by tracking dependencies, pinning and reviewing updates, scanning for known vulnerabilities, verifying artefacts and protecting build pipelines.</p>\n<h2>A04 Cryptographic failures</h2>\n<p>Cryptographic failures happen when sensitive data is not protected correctly.</p>\n<p>Examples include sending data over plaintext connections, using weak algorithms, reusing keys incorrectly, storing passwords with fast hashes, or exposing encryption keys.</p>\n<p>Prevent it by classifying sensitive data, using current libraries, applying TLS correctly, storing passwords with a slow password hashing function, and managing keys separately from data.</p>\n<h2>A05 Injection</h2>\n<p>Injection happens when untrusted input is interpreted as part of a command, query or expression.</p>\n<p>SQL injection is the classic example, but the same pattern appears in operating system commands, LDAP queries, template engines and other interpreters.</p>\n<p>Prevent it with parameterised APIs, safe frameworks, input validation, and by avoiding string concatenation for commands or queries.</p>\n<h2>A06 Insecure design</h2>\n<p>Insecure design is a flaw in the intended behaviour of a system, not just a coding mistake.</p>\n<p>Examples include password reset flows that trust weak proof, business rules that allow abuse, or workflows that assume users will not tamper with requests.</p>\n<p>Prevent it with threat modelling, secure design reviews, abuse case analysis and explicit security requirements before implementation.</p>\n<h2>A07 Authentication failures</h2>\n<p>Authentication failures let attackers pretend to be another user.</p>\n<p>Examples include weak password reset flows, missing multi factor authentication, exposure to credential stuffing, predictable session tokens and poor session invalidation.</p>\n<p>Prevent it with strong authentication flows, multi factor authentication for important accounts, safe session handling, rate limiting and secure recovery processes.</p>\n<h2>A08 Software or data integrity failures</h2>\n<p>Software or data integrity failures occur when code, updates or data can be changed without reliable detection.</p>\n<p>Examples include trusting unsigned updates, deserialising untrusted data, running unverified CI artefacts or allowing unprotected changes to critical configuration.</p>\n<p>Prevent it with signed artefacts, protected pipelines, integrity checks, careful deserialisation and change control for critical data.</p>\n<h2>A09 Security logging and alerting failures</h2>\n<p>Security logging and alerting failures mean attacks can happen without timely detection.</p>\n<p>Examples include missing audit logs for login, privilege changes, access denial, payment events or administrative actions. Logs that exist but are never monitored have limited value.</p>\n<p>Prevent it by logging security relevant events, protecting logs from tampering, setting useful alerts and testing incident response.</p>\n<h2>A10 Mishandling of exceptional conditions</h2>\n<p>Mishandling of exceptional conditions means the application behaves unsafely when something unexpected happens.</p>\n<p>Examples include exposing stack traces, continuing after a failed security check, leaking sensitive values in errors, failing open when a dependency is unavailable, or leaving partial state after a failed transaction.</p>\n<p>Prevent it with explicit error handling, safe defaults, tested failure paths and consistent user facing error messages.</p>\n<h2>How to use the list</h2>\n<p>The Top 10 is not a complete security programme. It is a starting point for awareness, prioritisation and review.</p>\n<p>Use it to guide training, code review, threat modelling and testing. Then add controls that match your application, data, users, architecture and deployment environment.</p>\n<h2>Conclusion</h2>\n<p>The OWASP Top 10 gives teams a shared language for common web application risks. The value is not memorising the names. The value is using the categories to find real weaknesses before attackers do.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "security",
        "web",
        "api"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/the-twelve-factor-app-revisited",
      "url": "https://soulstack.co.uk/blog/the-twelve-factor-app-revisited",
      "title": "The twelve-factor app, revisited",
      "summary": "The twelve-factor app remains a useful checklist for deployable web services, but it should not be treated as a complete architecture model. Its value is in the constraints it mak…",
      "content_html": "<p>The twelve-factor app remains a useful checklist for deployable web services, but it should not be treated as a complete architecture model. Its value is in the constraints it makes explicit: one codebase, explicit dependencies, environment based configuration, replaceable backing services, separated build and run stages, stateless processes, port binding, process based scaling, fast start and shutdown, environment parity, log streams, and one-off admin processes.</p>\n<h2>What still holds</h2>\n<p>The strongest parts of twelve-factor are still practical because they reduce hidden coupling. A service that stores configuration in the environment can move between deployments without a rebuild. A service that treats backing services as attached resources can swap a database, cache, queue, or SMTP provider by changing configuration rather than code. A service that writes logs to standard output leaves collection, retention, search, and alerting to the runtime platform.</p>\n<p>The process model also still matters. Running the application as one or more stateless processes makes horizontal scaling simpler. State that must survive a process restart belongs in a backing service, not in local memory or the local filesystem. That does not mean every application has to be stateless in the business sense. It means runtime processes should be replaceable.</p>\n<h2>What needs revisiting</h2>\n<p>The original model predates much of the current platform stack. Containers, service meshes, managed identity, policy as code, progressive delivery, distributed tracing, and managed observability are now normal in many production systems. These do not invalidate twelve-factor. They add detail that the original factors do not cover.</p>\n<p>The config factor needs extra care. Environment variables are useful for deployment specific values, but they are not a complete secrets management system. Sensitive values need controlled storage, rotation, access auditing, and least privilege. Environment variables can be part of the delivery mechanism, but the security property comes from the surrounding platform and operational process.</p>\n<p>The backing services factor also needs precision. Treating databases, caches, and queues as attached resources is a good abstraction for deployability. It is not a licence to ignore data ownership, transaction boundaries, latency, consistency, recovery, or migration strategy. A database is not interchangeable in the same way as a log drain.</p>\n<h2>Build, release, and run</h2>\n<p>Separating build, release, and run is still one of the most useful deployment disciplines. The build stage turns the code repository into an immutable artefact. The release stage combines that artefact with configuration. The run stage executes the release. A release cannot be mutated once it is created, so any change produces a new release. This separation makes rollbacks, promotion between environments, and incident analysis easier because the exact artefact and configuration can be identified.</p>\n<p>A common failure mode is rebuilding for each environment. That hides differences between staging and production inside build output. Prefer one artefact promoted through environments, with environment specific configuration applied at release or run time.</p>\n<h2>Disposability and shutdown</h2>\n<p>Fast startup and graceful shutdown are not cosmetic details. They affect deploy speed, autoscaling, crash recovery, and maintenance. A process should start quickly enough for the platform to replace failed capacity. It should also stop accepting new work during shutdown, finish or safely hand off in-flight work, and release leases or locks.</p>\n<p>Background workers need the same treatment as HTTP servers. A worker that pulls a job from a queue and then exits without acknowledgement rules can lose work or duplicate work, depending on the broker and worker configuration. Shutdown behaviour should be tested, not assumed.</p>\n<h2>Dev, staging, and production parity</h2>\n<p>Parity reduces defects that only appear after deployment. It does not require every environment to have the same scale or cost profile. It means the same application artefact, the same dependency classes, similar configuration mechanisms, and similar operational paths.</p>\n<p>A local database running in a container can be useful, but it is not equivalent to a managed production database with different extensions, isolation settings, backup behaviour, connection limits, and failure modes. The more a feature depends on platform behaviour, the more important it is to test that feature in an environment that resembles production.</p>\n<h2>What twelve-factor does not solve</h2>\n<p>Twelve-factor does not define service boundaries. It does not decide whether a system should be a monolith, modular monolith, or microservice architecture. It does not replace threat modelling, capacity planning, reliability engineering, schema design, data governance, or cost control.</p>\n<p>It is best used as a deployability baseline. A system can satisfy twelve-factor and still be unreliable. It can also violate one factor deliberately for a sound reason. The important part is that the trade-off is explicit.</p>\n<h2>Practical checklist</h2>\n<p>Use the following checks during design and review.</p>\n<h3>Code and artefacts</h3>\n<p>There is one codebase per deployable service. Builds are repeatable. The same artefact is promoted through environments.</p>\n<h3>Configuration</h3>\n<p>Deployment specific values are outside the codebase. Secrets are handled by a controlled secrets mechanism. Configuration changes are reviewable and auditable.</p>\n<h3>Runtime</h3>\n<p>Processes are stateless where possible, start quickly, shut down cleanly, and expose health signals that match real readiness.</p>\n<h3>Operations</h3>\n<p>Logs are emitted as event streams. Admin tasks run with the same code and configuration model as the application. Development, staging, and production stay close enough to catch integration defects early.</p>\n<h2>Conclusion</h2>\n<p>The twelve-factor app is still useful when it is treated as a deployability checklist. Its limits are just as important as its rules. Use it to remove hidden deployment assumptions, then add the missing production concerns: security, observability, data ownership, resilience, and operational readiness.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "architecture",
        "devops",
        "reliability"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/the-unix-commands-every-developer-should-know",
      "url": "https://soulstack.co.uk/blog/the-unix-commands-every-developer-should-know",
      "title": "The Unix commands every developer should know",
      "summary": "Unix command line tools are small, composable programs for inspecting files, transforming text, moving data and controlling processes. You do not need to memorise every option, bu…",
      "content_html": "<p>Unix command line tools are small, composable programs for inspecting files, transforming text, moving data and controlling processes. You do not need to memorise every option, but you should know the reliable defaults, the sharp edges and when to compose tools through pipes. This is a broad primer covering the commands you will reach for most days.</p>\n<h2>Start with navigation and inspection</h2>\n<p>Use <code>pwd</code>, <code>ls</code>, <code>cd</code> and <code>stat</code> to understand where you are and what a file is before changing it. Prefer explicit paths in scripts and automation. In an interactive shell, relative paths are convenient. In a script, they are a common source of accidental writes.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#79C0FF\">pwd</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">ls</span><span style=\"color:#79C0FF\"> -la</span></span>\n<span class=\"line\"><span style=\"color:#79C0FF\">stat</span><span style=\"color:#A5D6FF\"> README.md</span></span></code></pre></div><p>Use <code>file</code> when the extension is not enough. It reads file content and reports a detected type, which helps before piping binary data through text tools.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">file</span><span style=\"color:#A5D6FF\"> archive.tar.gz</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">file</span><span style=\"color:#A5D6FF\"> ./bin/tool</span></span></code></pre></div><p>Use <code>readlink</code> or <code>realpath</code> when symbolic links matter. <code>readlink</code> reports a link target. <code>realpath</code> resolves a path to an absolute canonical name, expanding symbolic links and removing <code>.</code> and <code>..</code> components.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">readlink</span><span style=\"color:#A5D6FF\"> ~/.config/editor/config</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">realpath</span><span style=\"color:#A5D6FF\"> ./src/../src/index.ts</span></span></code></pre></div><h2>Read files without editing them</h2>\n<p>Use <code>cat</code> for short files and for feeding data into another command. Use <code>less</code> for reading longer output, because it does not require loading the whole result into the terminal scrollback.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">cat</span><span style=\"color:#A5D6FF\"> package.json</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">less</span><span style=\"color:#A5D6FF\"> logs/app.log</span></span></code></pre></div><p>Use <code>head</code> and <code>tail</code> to sample. <code>tail -f</code> follows appended data, which is useful for logs.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">head</span><span style=\"color:#79C0FF\"> -n</span><span style=\"color:#79C0FF\"> 20</span><span style=\"color:#A5D6FF\"> logs/app.log</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">tail</span><span style=\"color:#79C0FF\"> -n</span><span style=\"color:#79C0FF\"> 50</span><span style=\"color:#A5D6FF\"> logs/app.log</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">tail</span><span style=\"color:#79C0FF\"> -f</span><span style=\"color:#A5D6FF\"> logs/app.log</span></span></code></pre></div><p>Use <code>wc</code> to count lines, words and bytes. In build and data checks, <code>wc -l</code> is often enough to catch an empty or unexpectedly large result.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">wc</span><span style=\"color:#79C0FF\"> -l</span><span style=\"color:#A5D6FF\"> src/</span><span style=\"color:#79C0FF\">**</span><span style=\"color:#A5D6FF\">/</span><span style=\"color:#79C0FF\">*</span><span style=\"color:#A5D6FF\">.ts</span></span></code></pre></div><h2>Move, copy and remove files deliberately</h2>\n<p>Use <code>cp</code>, <code>mv</code>, <code>mkdir</code> and <code>rm</code> with explicit operands. For directories, use <code>mkdir -p</code> when intermediate directories may not exist.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">mkdir</span><span style=\"color:#79C0FF\"> -p</span><span style=\"color:#A5D6FF\"> dist/assets</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">cp</span><span style=\"color:#A5D6FF\"> README.md</span><span style=\"color:#A5D6FF\"> dist/README.md</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">mv</span><span style=\"color:#A5D6FF\"> dist/README.md</span><span style=\"color:#A5D6FF\"> dist/readme.txt</span></span></code></pre></div><p>Be careful with <code>rm -r</code>, which removes directory trees. The <code>--</code> marker tells the command to stop parsing options, which matters when a path may begin with a hyphen. Safe deletion in scripts is covered in more depth in the post on writing safe, readable Bash scripts.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">rm</span><span style=\"color:#79C0FF\"> -rf</span><span style=\"color:#79C0FF\"> --</span><span style=\"color:#A5D6FF\"> \"</span><span style=\"color:#E6EDF3\">$target</span><span style=\"color:#A5D6FF\">\"</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">rm</span><span style=\"color:#79C0FF\"> --</span><span style=\"color:#A5D6FF\"> \"-temporary-file\"</span></span></code></pre></div><h2>Search names and content</h2>\n<p>Use <code>find</code> when the primary question is about paths or file metadata, and <code>grep</code> when the primary question is about matching lines in text.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">find</span><span style=\"color:#A5D6FF\"> .</span><span style=\"color:#79C0FF\"> -type</span><span style=\"color:#A5D6FF\"> f</span><span style=\"color:#79C0FF\"> -name</span><span style=\"color:#A5D6FF\"> '*.md'</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">grep</span><span style=\"color:#79C0FF\"> -R</span><span style=\"color:#A5D6FF\"> \"TODO\"</span><span style=\"color:#A5D6FF\"> src</span></span></code></pre></div><p>Use <code>find</code> with <code>-exec ... {} +</code> to run a command once over a batch of matched paths rather than spawning a new process per file. For more on fast searching, see the post on finding things fast.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">find</span><span style=\"color:#A5D6FF\"> .</span><span style=\"color:#79C0FF\"> -type</span><span style=\"color:#A5D6FF\"> f</span><span style=\"color:#79C0FF\"> -name</span><span style=\"color:#A5D6FF\"> '*.log'</span><span style=\"color:#79C0FF\"> -exec</span><span style=\"color:#A5D6FF\"> wc</span><span style=\"color:#79C0FF\"> -l</span><span style=\"color:#A5D6FF\"> {}</span><span style=\"color:#A5D6FF\"> +</span></span></code></pre></div><h2>Transform text in pipelines</h2>\n<p>Use <code>sort</code> to order lines. Set <code>LC_ALL=C</code> when bytewise, reproducible ordering matters more than locale-aware collation, since this forces byte-order comparison rather than the rules of the current locale.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#E6EDF3\">LC_ALL</span><span style=\"color:#FF7B72\">=</span><span style=\"color:#A5D6FF\">C</span><span style=\"color:#FFA657\"> sort</span><span style=\"color:#A5D6FF\"> names.txt</span></span></code></pre></div><p>Use <code>uniq</code> only after sorting, unless you only want to collapse adjacent duplicates.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#E6EDF3\">LC_ALL</span><span style=\"color:#FF7B72\">=</span><span style=\"color:#A5D6FF\">C</span><span style=\"color:#FFA657\"> sort</span><span style=\"color:#A5D6FF\"> names.txt</span><span style=\"color:#FF7B72\"> |</span><span style=\"color:#FFA657\"> uniq</span><span style=\"color:#79C0FF\"> -c</span></span></code></pre></div><p>Use <code>cut</code> for simple column extraction from delimited text. Use <code>awk</code> when the transformation needs fields, conditions or arithmetic.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">cut</span><span style=\"color:#79C0FF\"> -d</span><span style=\"color:#A5D6FF\"> ':'</span><span style=\"color:#79C0FF\"> -f</span><span style=\"color:#79C0FF\"> 1</span><span style=\"color:#A5D6FF\"> /etc/passwd</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">awk</span><span style=\"color:#79C0FF\"> -F</span><span style=\"color:#A5D6FF\"> ','</span><span style=\"color:#A5D6FF\"> '$3 == \"active\" { print $1 }'</span><span style=\"color:#A5D6FF\"> users.csv</span></span></code></pre></div><p>Use <code>sed</code> for simple stream edits, especially substitution. Keep complex parsing out of <code>sed</code> and use a real parser for structured formats.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">sed</span><span style=\"color:#A5D6FF\"> 's/foo/bar/g'</span><span style=\"color:#A5D6FF\"> input.txt</span></span></code></pre></div><h2>Inspect processes and resources</h2>\n<p>Use <code>ps</code> to inspect processes, <code>kill</code> to send signals and <code>df</code> or <code>du</code> to inspect storage.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">ps</span><span style=\"color:#A5D6FF\"> aux</span><span style=\"color:#FF7B72\"> |</span><span style=\"color:#FFA657\"> grep</span><span style=\"color:#A5D6FF\"> '[n]ode'</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">df</span><span style=\"color:#79C0FF\"> -h</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">du</span><span style=\"color:#79C0FF\"> -sh</span><span style=\"color:#A5D6FF\"> node_modules</span></span></code></pre></div><p>Use <code>env</code> to inspect environment variables and to run a command with a controlled environment.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">env</span><span style=\"color:#FF7B72\"> |</span><span style=\"color:#FFA657\"> sort</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">env</span><span style=\"color:#A5D6FF\"> NODE_ENV=production</span><span style=\"color:#A5D6FF\"> node</span><span style=\"color:#A5D6FF\"> server.js</span></span></code></pre></div><h2>Compose commands carefully</h2>\n<p>Pipes connect the standard output of one command to the standard input of the next. Redirection writes output to files.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">grep</span><span style=\"color:#79C0FF\"> -R</span><span style=\"color:#A5D6FF\"> \"deprecated\"</span><span style=\"color:#A5D6FF\"> src</span><span style=\"color:#FF7B72\"> |</span><span style=\"color:#FFA657\"> wc</span><span style=\"color:#79C0FF\"> -l</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">sort</span><span style=\"color:#A5D6FF\"> input.txt</span><span style=\"color:#FF7B72\"> ></span><span style=\"color:#A5D6FF\"> sorted.txt</span></span></code></pre></div><p>Use <code>tee</code> when you need to see output and write it to a file.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">npm</span><span style=\"color:#A5D6FF\"> test</span><span style=\"color:#FF7B72\"> 2>&#x26;1</span><span style=\"color:#FF7B72\"> |</span><span style=\"color:#FFA657\"> tee</span><span style=\"color:#A5D6FF\"> test.log</span></span></code></pre></div><p>By default the exit status of a pipeline is the exit status of its last command, so a failure earlier in the pipeline can go unnoticed. In Bash scripts, use <code>set -o pipefail</code> so the pipeline reports a non-zero status if any command in it fails.</p>\n<h2>Prefer boring commands in automation</h2>\n<p>Interactive shortcuts are fine at the prompt. Scripts should be more conservative. Quote variables, use explicit paths, choose commands with stable semantics and make destructive operations easy to review.</p>\n<p>A good habit is to ask three questions before running a command: what input will it read, what output will it write and what will happen if a filename contains whitespace or starts with a hyphen.</p>\n<h2>Conclusion</h2>\n<p>The most useful Unix commands are not obscure. They are the small tools that let you inspect state, filter text, move files and compose repeatable pipelines. Learn the common cases first, then learn the safety rules that keep those commands predictable in scripts and production shells.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "cli",
        "devops"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/time-zones-dates-and-utc-getting-it-right",
      "url": "https://soulstack.co.uk/blog/time-zones-dates-and-utc-getting-it-right",
      "title": "Time zones, dates and UTC: getting it right",
      "summary": "Date and time bugs happen when code mixes different concepts: an instant, a local date, a local time, a time zone, and an offset. Treat them as separate values and many failures d…",
      "content_html": "<p>Date and time bugs happen when code mixes different concepts: an instant, a local date, a local time, a time zone, and an offset. Treat them as separate values and many failures disappear.</p>\n<h2>Use the right model</h2>\n<p>An instant is a single point on the timeline. It can be stored and compared without knowing where the user is.</p>\n<p>A local date is a calendar date without a time or time zone, such as a birthday or a reporting day. It should not be converted to midnight UTC unless the domain really means that instant.</p>\n<p>A local date and time is a wall-clock value, such as 2026-10-25 01:30. It may be ambiguous or invalid in a time zone with daylight saving transitions.</p>\n<p>A time zone is a ruleset for a region, such as Europe/London. It is not the same as an offset. A zone can use different offsets at different times.</p>\n<p>An offset is the difference from UTC at one instant, such as +00:00 or +01:00. It does not contain future daylight saving rules.</p>\n<h2>Store instants for events that happened</h2>\n<p>For events that happened at a specific moment, store an instant. Examples include audit entries, login attempts, payment captures, build starts, queue message creation, and log records.</p>\n<p>Serialise instants with an explicit offset. For internet timestamps, RFC 3339 defines a widely used profile of ISO 8601. UTC is commonly written with <code>Z</code>.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#E6EDF3\">{</span></span>\n<span class=\"line\"><span style=\"color:#7EE787\">  \"createdAt\"</span><span style=\"color:#E6EDF3\">: </span><span style=\"color:#A5D6FF\">\"2026-06-04T10:15:30Z\"</span></span>\n<span class=\"line\"><span style=\"color:#E6EDF3\">}</span></span></code></pre></div><p>Do not serialise a timestamp without an offset unless the receiving system explicitly expects a local date and time. A value such as <code>2026-06-04T10:15:30</code> is not enough to identify an instant.</p>\n<h2>Keep local dates as local dates</h2>\n<p>Some values are dates, not instants. Examples include birthdays, licence renewal days, holiday dates, invoice dates, and calendar days in a user&#39;s locale.</p>\n<p>Store these as dates when the domain is date-based.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#E6EDF3\">{</span></span>\n<span class=\"line\"><span style=\"color:#7EE787\">  \"renewalDate\"</span><span style=\"color:#E6EDF3\">: </span><span style=\"color:#A5D6FF\">\"2026-06-04\"</span></span>\n<span class=\"line\"><span style=\"color:#E6EDF3\">}</span></span></code></pre></div><p>Converting a date to midnight UTC can move it to the previous or next local day for users in other time zones. That is a modelling error, not a formatting issue.</p>\n<h2>Store the time zone when future local time matters</h2>\n<p>Future scheduled events often need a time zone, not just an instant. A meeting scheduled for 09:00 in Europe/London should remain 09:00 local time even if the offset changes between winter and summer.</p>\n<p>For future human schedules, store:</p>\n<ul>\n<li>The local date.</li>\n<li>The local time.</li>\n<li>The IANA time zone identifier.</li>\n<li>The resolved instant when needed for execution.</li>\n</ul>\n<p>This preserves user intent and still allows systems to trigger jobs at the correct instant.</p>\n<h2>Do not treat offsets as time zones</h2>\n<p>The offset <code>+01:00</code> tells you the offset for one instant. It does not tell you whether the location is Europe/London, Europe/Paris, Africa/Lagos, or another region using the same offset at that moment.</p>\n<p>If the application needs daylight saving behaviour, legal local time, or future scheduling, store a time zone identifier from the IANA time zone database.</p>\n<p>The IANA time zone database is updated when political bodies change time zone boundaries or daylight saving rules. Keep runtime time zone data up to date, especially for scheduling systems.</p>\n<h2>Handle daylight saving gaps and repeats</h2>\n<p>Daylight saving transitions create two important cases.</p>\n<p>A gap is a local time that does not exist because clocks move forward. A repeat is a local time that occurs twice because clocks move back.</p>\n<p>Code that accepts local date and time input must define a policy. It can reject invalid times, ask the user to choose, or apply a documented disambiguation rule. Silent conversion is dangerous because it hides a business decision in library behaviour.</p>\n<p>Test around transitions for the zones your users use. Do not only test UTC.</p>\n<h2>Be explicit at system boundaries</h2>\n<p>Every API, event, database column, and log field should make its date and time meaning clear. Names help.</p>\n<p>Use names such as:</p>\n<ul>\n<li><code>createdAt</code> for an instant.</li>\n<li><code>localDate</code> for a date without time.</li>\n<li><code>timeZone</code> for an IANA zone identifier.</li>\n<li><code>startsAt</code> for a resolved instant.</li>\n<li><code>startsOn</code> for a local date.</li>\n</ul>\n<p>Avoid names such as <code>date</code>, <code>time</code>, or <code>timestamp</code> when they do not specify the model.</p>\n<h2>Use libraries that expose the distinction</h2>\n<p>Prefer date and time APIs that distinguish exact time, local date, local time, and zoned date time. In JavaScript, the Temporal proposal is designed around separate types for date-only, instant, and zoned date-time use cases. In other ecosystems, choose equivalent types rather than forcing every value through a single timestamp class.</p>\n<p>The important rule is conceptual, not language-specific: do not use one type for every temporal value.</p>\n<h2>Log in UTC, display in the user&#39;s context</h2>\n<p>Operational logs are easiest to correlate when they use UTC instants. User interfaces are easiest to understand when they display dates and times in the user&#39;s expected locale and time zone.</p>\n<p>Keep the stored value and displayed value separate. Formatting is a presentation step. It should not change the underlying instant.</p>\n<h2>Test with real edge cases</h2>\n<p>Add tests for:</p>\n<ul>\n<li>A normal day.</li>\n<li>A daylight saving gap.</li>\n<li>A daylight saving repeat.</li>\n<li>A leap year date.</li>\n<li>A date-only value for a user west of UTC.</li>\n<li>A date-only value for a user east of UTC.</li>\n<li>Serialisation with an explicit offset.</li>\n<li>Parsing invalid or offset-less input.</li>\n</ul>\n<p>Use named zones, not only fixed offsets. Fixed offsets do not exercise daylight saving rules.</p>\n<h2>Conclusion</h2>\n<p>Getting dates right starts with modelling. Store instants for events that happened, keep local dates as dates, store an IANA time zone for future local schedules, and never confuse an offset with a zone. UTC is excellent for instants, but it is not a replacement for local date and time semantics.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "architecture",
        "api",
        "reliability"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/understanding-tls-and-certificates",
      "url": "https://soulstack.co.uk/blog/understanding-tls-and-certificates",
      "title": "Understanding TLS and certificates",
      "summary": "TLS protects data moving across an untrusted network. On the web, HTTPS is HTTP over TLS, and certificates are the mechanism that lets a client authenticate the server name before…",
      "content_html": "<p>TLS protects data moving across an untrusted network. On the web, HTTPS is HTTP over TLS, and certificates are the mechanism that lets a client authenticate the server name before sending sensitive requests. This post goes deeper on how that protection works and where it stops.</p>\n<h2>What TLS provides</h2>\n<p>TLS is designed to provide confidentiality, integrity and authentication. Confidentiality means network observers cannot read the protected traffic. Integrity means traffic cannot be changed without detection. Authentication means the client can verify that it is talking to the expected server.</p>\n<p>TLS does not make an application secure by itself. It protects the connection. Application authentication, authorisation, input handling and session management still matter.</p>\n<h2>The certificate proves a name binding</h2>\n<p>A server certificate binds a public key to one or more domain names. During the TLS handshake the server proves possession of the private key that matches the certificate public key. The client then validates the certificate chain and checks that the certificate is valid for the requested host name.</p>\n<p>Most web certificates use the Web PKI. A certificate is trusted because it chains back to a root certificate that the client already trusts.</p>\n<h2>Certificate chains</h2>\n<p>Servers normally send their leaf certificate and one or more intermediate certificates. The client uses those certificates to build a path to a trusted root. The root certificate itself is usually already in the client&#39;s trust store and does not need to be sent by the server.</p>\n<p>A missing intermediate certificate can break validation for some clients. A certificate that is expired, not yet valid, revoked by policy or issued for the wrong name should be rejected.</p>\n<h2>The TLS handshake in plain terms</h2>\n<p>The client connects and sends a ClientHello with supported protocol versions, cipher suites and extensions such as the server name. The server replies with selected parameters and its certificate. The parties establish shared keys and then protect application traffic with those keys.</p>\n<p>TLS 1.3 simplified the handshake compared with earlier versions and removed many older cryptographic options, including legacy cipher suites and static key exchange. Current deployments should prefer TLS 1.3 and keep TLS 1.2 only where needed for compatible clients.</p>\n<h2>SNI and hostname validation</h2>\n<p>Server Name Indication, usually called SNI, lets the client tell the server which host name it wants during the TLS handshake. That allows one IP address to serve certificates for many names. In the original design the SNI value is sent in cleartext, so a passive observer can see which name the client asked for.</p>\n<p>Hostname validation is separate from SNI. The client must still check that the certificate covers the URL host. A valid certificate for another domain must not be accepted.</p>\n<h2>Certificate authorities and domain validation</h2>\n<p>A certificate authority issues a certificate after validating the applicant according to the certificate type. Domain Validation certificates prove control over the domain name. Organisation Validation and Extended Validation add checks about the organisation, but modern browser UI does not make them a substitute for application trust decisions.</p>\n<p>Automated certificate issuance has made HTTPS easier to operate. Automation matters because certificates expire and renewal failures cause outages.</p>\n<h2>Expiry and renewal</h2>\n<p>Certificates have finite validity periods. Shorter validity reduces the useful life of a compromised certificate and pushes operators towards automation. Renew certificates automatically and monitor expiry from outside the deployment path.</p>\n<p>Do not wait until the final day to renew. Renewal should happen early enough to allow for DNS issues, CA issues, deployment failures and human review when automation breaks.</p>\n<h2>What TLS does not hide</h2>\n<p>TLS protects the request and response content, including paths and headers, after the handshake is complete. It does not hide the IP address being contacted. Depending on the protocol and deployment, observers may still infer or see the server name, certificate metadata, timing and traffic sizes.</p>\n<p>Encrypted DNS and newer TLS extensions can reduce some metadata exposure, but they do not make network activity invisible.</p>\n<h2>Operational guidance</h2>\n<p>Serve HTTPS everywhere and redirect HTTP to HTTPS. Enable HTTP Strict Transport Security only after confirming that all required subdomains can serve HTTPS correctly. Use modern TLS configuration, disable obsolete protocol versions and weak cipher suites, and keep certificate chains complete.</p>\n<p>Protect private keys. If a private key is exposed, replace the certificate and key pair. Do not share one wildcard certificate and key across unrelated systems unless the operational risk has been reviewed.</p>\n<h2>Debugging TLS problems</h2>\n<p>Check the name, chain, validity period and protocol support. Confirm that the server presents the expected certificate for the requested host name, especially when virtual hosting and SNI are involved.</p>\n<p>Then test from the affected environment. Corporate proxies, old clients, missing trust anchors and middleboxes can produce failures that do not reproduce from a developer laptop.</p>\n<h2>Conclusion</h2>\n<p>TLS gives the web a protected transport layer, and certificates let clients authenticate server names. Reliable HTTPS depends on correct certificate chains, hostname validation, modern protocol settings, private key protection, automated renewal and monitoring.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "security",
        "web",
        "reliability"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/webhooks-delivery-retries-and-verifying-signatures",
      "url": "https://soulstack.co.uk/blog/webhooks-delivery-retries-and-verifying-signatures",
      "title": "Webhooks: delivery, retries and verifying signatures",
      "summary": "A webhook is an HTTP callback from one system to another. It is simple to start, but production webhook handling needs explicit rules for delivery, retries, ordering, deduplicatio…",
      "content_html": "<p>A webhook is an HTTP callback from one system to another. It is simple to start, but production webhook handling needs explicit rules for delivery, retries, ordering, deduplication, and signature verification.</p>\n<h2>Assume at-least-once delivery</h2>\n<p>Webhook providers usually optimise for delivery rather than exactly-once processing. If your endpoint times out, returns a failing status, or is unreachable, the provider may retry the same event. The same event can also arrive more than once because of network failures or provider-side retries.</p>\n<p>Design every webhook handler to be idempotent. Store the provider event ID before processing. If the event has already been handled, return a successful response without repeating the side effect.</p>\n<h2>Acknowledge quickly and process asynchronously</h2>\n<p>The webhook endpoint should do the minimum work needed to authenticate the request, validate the event shape, persist the event, and enqueue processing. Then it should return a <code>2xx</code> response.</p>\n<p>Do not call several downstream services before acknowledging the webhook. Long synchronous handlers increase timeout risk, which increases duplicate delivery. Use a queue or durable job table for work that can take time.</p>\n<h2>Verify signatures before trusting the body</h2>\n<p>Treat the webhook request as untrusted until verification succeeds. A typical provider signs the raw request body with a shared secret and sends the signature in a header. Verification must use the raw bytes received by the endpoint. Re-serialising parsed JSON can change whitespace or ordering and break verification.</p>\n<p>Use the provider&#39;s official library when one exists. It will usually handle timestamp tolerance, multiple signatures during secret rotation, and algorithm details. If you implement verification yourself, compute an HMAC of the raw body with the shared secret, compare signatures using a constant-time comparison, and reject stale timestamps.</p>\n<h2>Rotate secrets safely</h2>\n<p>Webhook signing secrets should be stored like credentials. Do not log them, embed them in client code, or share them across unrelated environments. Production, staging, and development should use separate secrets.</p>\n<p>Secret rotation should allow an overlap period where both the old and new secret can verify incoming events. After the provider stops signing with the old secret, remove it from the receiver.</p>\n<h2>Validate event schema and type</h2>\n<p>Signature verification proves that the request came from someone with the secret. It does not prove that your code can process every event type safely. Check the event type, version if present, account or tenant identifier, and required fields before processing.</p>\n<p>Ignore or store unknown event types rather than failing the whole endpoint. Providers add new event types over time. A receiver should not break because it subscribed broadly or because the provider expanded its catalogue.</p>\n<h2>Handle ordering explicitly</h2>\n<p>Do not assume events arrive in the order they happened. Retries, parallel delivery, provider partitions, and network delays can reorder events. If order matters, use the event timestamp, sequence number, resource version, or fetch the current resource state from the provider before applying a change.</p>\n<p>For many integrations, a fetch-before-process pattern is safer than trusting the event payload as the final state. The webhook tells you that something changed. The provider API tells you what the state is now.</p>\n<h2>Return the right status</h2>\n<p>Return <code>2xx</code> only when the event has been accepted for processing. Return <code>400</code> for malformed requests that should not be retried. Return <code>401</code> or <code>403</code> when authentication or signature verification fails. Return <code>5xx</code> only for temporary receiver failures where a retry is useful.</p>\n<p>Be careful when rate limiting a webhook provider. A <code>429</code> can trigger retries and worsen a backlog. When possible, accept, persist, and process later rather than rejecting bursts from a trusted provider.</p>\n<h2>Keep observability per event</h2>\n<p>Log the provider event ID, event type, delivery attempt if available, verification result, processing status, and correlation ID. Metrics should show received, verified, deduplicated, queued, processed, failed, and retried counts.</p>\n<p>Provide an operator path to replay stored events safely. Replay must use the same idempotency checks as live delivery.</p>\n<h2>Test with real provider tooling</h2>\n<p>Use provider dashboards, CLIs, fixtures, and local forwarding tools to test signature verification and retry behaviour. Unit tests with hand-written JSON are useful, but they do not prove that raw-body handling, headers, timestamps, and endpoint status codes work with the real provider.</p>\n<p>Document how to add a new event type, how to rotate secrets, how to replay an event, and how to disable processing during an incident without losing incoming events.</p>\n<h2>Conclusion</h2>\n<p>Reliable webhooks are built around at-least-once delivery, fast acknowledgement, durable storage, idempotent processing, signature verification, explicit ordering rules, and event-level observability. The endpoint is not just a controller. It is the boundary between two distributed systems.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "api",
        "security",
        "reliability",
        "architecture"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/when-to-split-a-monolith-and-when-not-to",
      "url": "https://soulstack.co.uk/blog/when-to-split-a-monolith-and-when-not-to",
      "title": "When to split a monolith (and when not to)",
      "summary": "Splitting a monolith is an architectural trade-off, not a maturity badge. A monolith can be simple, fast to change, and reliable when it has clear internal boundaries. Microservic…",
      "content_html": "<p>Splitting a monolith is an architectural trade-off, not a maturity badge. A monolith can be simple, fast to change, and reliable when it has clear internal boundaries. Microservices can improve autonomy and scaling, but they also add network calls, distributed data, deployment coordination, observability requirements, and operational cost.</p>\n<h2>Start with the problem</h2>\n<p>Do not split because the codebase is large. Split because there is a specific pressure that a single deployable unit cannot handle well.</p>\n<p>Good reasons include independent delivery needs, different scaling profiles, clear ownership boundaries, conflicting reliability requirements, or a domain area that changes at a different pace from the rest of the system. Weak reasons include fashion, resume driven design, or the hope that network boundaries will fix poor modularity.</p>\n<h2>Keep the monolith when it is working</h2>\n<p>A monolith is often the right choice when the team is small, the domain is still changing quickly, and service boundaries are not yet clear. In that stage, a distributed design can freeze bad boundaries early and make refactoring harder.</p>\n<p>A modular monolith can give many of the benefits of separation without the operational cost of distributed services. Use explicit modules, clear ownership, internal APIs, dependency rules, and separate tests. Enforce boundaries in code review and tooling. If the team cannot keep boundaries inside one process, it will usually struggle to keep them across a network.</p>\n<h2>Signals that a split may help</h2>\n<p>A split may be justified when one part of the system needs to deploy frequently while the rest must remain stable. It may help when one capability needs very different resource scaling, for example CPU heavy processing separate from normal request handling. It may help when a component has a distinct data model, a stable interface, and a team ready to own it from code to operations.</p>\n<p>A split can also reduce blast radius. If a non-critical capability fails often and currently affects critical paths, extracting it behind a resilient interface can make the overall system safer. This only works when the new service boundary reduces coupling rather than moving it into synchronous calls and shared databases.</p>\n<h2>Signals that a split is premature</h2>\n<p>A split is premature when the team cannot describe the boundary in business terms. It is also premature when the proposed service needs direct access to many tables owned by the monolith, shares domain objects freely, or requires coordinated deployments for routine changes.</p>\n<p>Other warning signs include no automated deployment, weak observability, no centralised logging, no tracing across calls, no clear service ownership, and no tested rollback process. Microservices need these capabilities early. Without them, incidents become harder to understand and slower to recover.</p>\n<h2>Data is the hard part</h2>\n<p>Code is usually easier to split than data. A service boundary is weak if the new service and the monolith keep writing the same tables. Shared databases preserve coupling while adding network and deployment complexity.</p>\n<p>Prefer boundaries where a service can own its data and expose behaviour through an API or event stream. When that is not possible, plan the migration explicitly. Transitional patterns can include read replicas, change data capture, synchronisation events, dual writes with reconciliation, or an anti-corruption layer. Each choice has failure modes.</p>\n<h2>Use incremental migration</h2>\n<p>A big rewrite is rarely the safest path. The strangler fig pattern replaces selected behaviour gradually while the old and new systems coexist. Routing, facades, events, or API gateways can direct part of the traffic to the new implementation while the monolith continues to serve the rest.</p>\n<p>Start with a capability that has clear boundaries and measurable value. Avoid extracting the most central and tangled component first. A successful first extraction should prove deployment, monitoring, rollback, data ownership, and on-call readiness.</p>\n<h2>Define the service contract</h2>\n<p>Before extraction, define the contract. Include inputs, outputs, error behaviour, timeouts, retry guidance, idempotency, authentication, authorisation, versioning, and ownership. A service is not independent if every change requires lockstep changes in its consumers.</p>\n<p>Keep contracts small and business focused. Do not expose the new service as a remote version of its internal tables. That creates a distributed monolith.</p>\n<h2>Operational readiness</h2>\n<p>Each new service needs build pipelines, deployment automation, metrics, logs, traces, alerts, dashboards, runbooks, capacity limits, dependency maps, and security controls. It also needs an owner who can respond when it fails.</p>\n<p>This cost is acceptable when the service provides enough autonomy, resilience, or scaling benefit. It is waste when the split only adds another place for the same team to deploy the same change.</p>\n<h2>Decision checklist</h2>\n<p>Split when most of the following are true.</p>\n<h3>Boundary</h3>\n<p>The capability has a clear business boundary, stable language, and limited coupling to the rest of the system.</p>\n<h3>Ownership</h3>\n<p>A team can own the service, its data, its reliability, and its deployments.</p>\n<h3>Benefit</h3>\n<p>The split improves delivery speed, scaling, resilience, compliance, or cost in a measurable way.</p>\n<h3>Readiness</h3>\n<p>The platform supports automated deployment, observability, rollback, secrets, and access control.</p>\n<h3>Migration</h3>\n<p>There is an incremental path that avoids a high risk rewrite.</p>\n<p>Do not split when the main problem is messy code, unclear domain modelling, weak tests, or poor deployment discipline. Fix those first.</p>\n<h2>Conclusion</h2>\n<p>Split a monolith when a clear boundary and measurable benefit justify the extra distributed systems cost. Keep it together when the domain is still fluid, the team is small, or the platform is not ready. A well-structured monolith is not a failure. A poorly separated set of services is still a monolith, only slower and harder to operate.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "architecture",
        "reliability",
        "devops"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/why-your-ci-pipeline-is-slow-fragile-and-lying-to-you",
      "url": "https://soulstack.co.uk/blog/why-your-ci-pipeline-is-slow-fragile-and-lying-to-you",
      "title": "Why your CI pipeline is slow, fragile, and lying to you",
      "summary": "A slow CI pipeline is visible. A fragile one is tolerated. A misleading one is dangerous. The worst pipelines do not just waste time. They give teams confidence that a change is s…",
      "content_html": "<p>A slow CI pipeline is visible. A fragile one is tolerated. A misleading one is dangerous. The worst pipelines do not just waste time. They give teams confidence that a change is safe when the checks are incomplete, noisy, or disconnected from production risk.</p>\n<h2>Speed is not the first problem</h2>\n<p>Pipeline duration matters because feedback delay changes behaviour. Engineers batch changes, defer tests, retry blindly, or merge with less context when feedback is slow. But speed alone is not the target.</p>\n<p>A fast pipeline that skips meaningful checks is worse than a slow one. A useful pipeline answers a specific question: is this change safe enough to progress to the next stage?</p>\n<p>That question requires a clear test strategy. Unit tests should protect local logic. Integration tests should protect contracts. Security checks should protect known classes of risk. Build steps should prove the artefact can be produced repeatably. Deployment checks should prove the artefact can move through the release path.</p>\n<h2>Fragility usually comes from hidden state</h2>\n<p>CI becomes fragile when jobs depend on mutable external state: shared databases, uncontrolled test data, floating dependencies, overloaded runners, undeclared services, or caches that are treated as correctness mechanisms.</p>\n<p>Caching is useful for performance, but it should not be required for correctness. A cache miss should make the job slower, not different. When a clean run and a cached run produce different results, the pipeline is hiding a dependency problem. This is the same reasoning behind hermetic builds, where the same inputs are expected to produce the same output regardless of the host.</p>\n<p>The same logic applies to test order, time zones, random ports, live third party APIs, and environment variables that exist on one runner but not another. A pipeline that only passes in one accidental environment is not a quality gate.</p>\n<h2>Retries are a signal, not a fix</h2>\n<p>Retries can reduce noise from transient infrastructure failures. They should not be used to normalise flaky tests. A flaky test is one that both passes and fails against the same source code, with no change to the code, the test, or the environment. A test that passes on the third attempt has still reported useful information: something is nondeterministic.</p>\n<p>Track retry rate separately from failure rate. A green build that needed multiple retries should not be treated as equivalent to a clean build. Flakiness consumes attention, weakens trust, and eventually trains teams to ignore red builds.</p>\n<h2>The pipeline may be lying about coverage</h2>\n<p>Many pipelines report success without checking the riskiest parts of a change. A service change may pass tests but never exercise database migrations. A frontend change may pass build checks but never validate accessibility or browser behaviour. An infrastructure change may validate syntax but not policy impact.</p>\n<p>The answer is not to add every possible check to every pull request. The answer is to classify changes and run the checks that match the risk. A documentation change should not wait behind a full production simulation. A permissions change should not skip policy review because unit tests passed.</p>\n<h2>CI should produce decisions</h2>\n<p>Good CI output is not a wall of logs. It is a decision record. What changed? What was checked? What was skipped, and why? Which artefact was produced? Which version of each tool ran? What evidence supports promotion?</p>\n<p>This matters for debugging and for audit. If a production incident traces back to a change, the pipeline should show what the system believed at the time of release.</p>\n<h2>Conclusion</h2>\n<p>A good CI pipeline is not simply fast. It is deterministic, targeted, and honest. It separates performance optimisations from correctness, treats flakiness as a defect, and gives teams evidence they can use. The goal is not a green badge. The goal is trustworthy feedback.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "devops",
        "reliability",
        "performance"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/writing-a-readme-people-actually-read",
      "url": "https://soulstack.co.uk/blog/writing-a-readme-people-actually-read",
      "title": "Writing a README people actually read",
      "summary": "A README is not a brochure. It is the first operational guide for a repository, so it should help a reader decide what the project is, whether it is relevant, and how to run or us…",
      "content_html": "<p>A README is not a brochure. It is the first operational guide for a repository, so it should help a reader decide what the project is, whether it is relevant, and how to run or use it without hunting through the tree.</p>\n<h2>Start with the reader&#39;s first question</h2>\n<p>A useful README answers the first question before anything else: what does this repository do?</p>\n<p>Open with a short description in plain language. Name the problem the project solves, the main audience, and the expected result. Keep it concrete. Avoid vague claims such as &quot;simple&quot;, &quot;powerful&quot;, &quot;modern&quot;, or &quot;production ready&quot; unless the rest of the README proves exactly what those words mean.</p>\n<p>A good opening usually covers:</p>\n<ul>\n<li>What the project is.</li>\n<li>Who it is for.</li>\n<li>What outcome it gives the reader.</li>\n<li>The current status, if it is experimental, deprecated, private, or incomplete.</li>\n</ul>\n<p>Do not make the reader infer purpose from package names, badges, directory names, or screenshots. Those are supporting details, not the explanation.</p>\n<h2>Put the shortest successful path near the top</h2>\n<p>The next section should give the shortest reliable path from clone to working result. This is not the place for every option. It is the place for the normal case that a competent newcomer should run first.</p>\n<p>Use commands that can be copied as written. Include required versions only when they matter and can be verified. Show expected output when it reduces ambiguity, especially for setup, tests, local servers, or generated files.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">npm</span><span style=\"color:#A5D6FF\"> install</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">npm</span><span style=\"color:#A5D6FF\"> test</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">npm</span><span style=\"color:#A5D6FF\"> run</span><span style=\"color:#A5D6FF\"> dev</span></span></code></pre></div><p>If setup depends on environment variables, configuration files, credentials, generated code, services, or local tooling, state that before the command sequence. A README that hides prerequisites creates wasted debugging time.</p>\n<h2>Explain the repository shape</h2>\n<p>After the quick start, describe the major directories. The goal is not to list every file. The goal is to give the reader a map.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span>src/</span></span>\n<span class=\"line\"><span>  application code</span></span>\n<span class=\"line\"><span>tests/</span></span>\n<span class=\"line\"><span>  automated tests</span></span>\n<span class=\"line\"><span>docs/</span></span>\n<span class=\"line\"><span>  longer form documentation</span></span>\n<span class=\"line\"><span>scripts/</span></span>\n<span class=\"line\"><span>  repository maintenance scripts</span></span></code></pre></div><p>Keep descriptions functional. &quot;Contains helpers&quot; is rarely useful. &quot;Builds the search index used by the static generator&quot; is useful because it explains why the directory exists.</p>\n<p>When a repository has several entry points, identify the most important one. For a service this may be the server bootstrap. For a library it may be the public package export. For a command line tool it may be the executable wrapper and the command handler.</p>\n<p>Treat this as a pointer, not a full tour. The detailed layout belongs in dedicated docs so the README stays short.</p>\n<h2>Separate usage from development</h2>\n<p>Usage instructions and development instructions answer different questions. Usage explains how to consume the project. Development explains how to change it safely.</p>\n<p>For a library, usage may include installation, import examples, public API links, and compatibility notes. For a service, usage may include local configuration, runtime dependencies, and deployment assumptions. For development, include test commands, lint commands, formatting commands, fixture generation, and common failure modes.</p>\n<p>Keep this distinction visible with headings. A reader who only wants to use the project should not have to read the contribution workflow. A maintainer who wants to patch the project should not have to reverse engineer the test command.</p>\n<h2>Keep examples small and executable</h2>\n<p>Examples should be short enough to understand and complete enough to run. A fragment that omits imports, setup, or required context can be worse than no example, because it creates false confidence.</p>\n<p>Prefer one realistic example over several decorative ones. Name values clearly. Avoid placeholder names that look like real configuration. If an example is intentionally partial, say so before the block.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FF7B72\">import</span><span style=\"color:#E6EDF3\"> { parseUserId } </span><span style=\"color:#FF7B72\">from</span><span style=\"color:#A5D6FF\"> \"./user-id\"</span><span style=\"color:#E6EDF3\">;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">const</span><span style=\"color:#79C0FF\"> userId</span><span style=\"color:#FF7B72\"> =</span><span style=\"color:#D2A8FF\"> parseUserId</span><span style=\"color:#E6EDF3\">(</span><span style=\"color:#A5D6FF\">\"user_123\"</span><span style=\"color:#E6EDF3\">);</span></span>\n<span class=\"line\"><span style=\"color:#E6EDF3\">console.</span><span style=\"color:#D2A8FF\">log</span><span style=\"color:#E6EDF3\">(userId.value);</span></span></code></pre></div><p>Do not label non-shell content as bash. Configuration belongs in a single fenced block with the correct language label, not in a shell code group.</p>\n<h2>Link out instead of duplicating long documentation</h2>\n<p>A README should be a good front door, not the whole building. Link to longer files when details would bury the quick path. Good candidates include architecture notes, contributing rules, security policy, release process, and generated API references.</p>\n<p>Use descriptive link text. &quot;Read the contributing guide&quot; is better than &quot;click here&quot; because it stays meaningful when scanned out of context, and it reads clearly to anyone using a screen reader who hears links out of their surrounding text.</p>\n<p>When documentation lives in several places, make the README the route map. The reader should know where to go next and what each link is for.</p>\n<h2>Make status explicit</h2>\n<p>A repository can be active, experimental, archived, internal, deprecated, or a historical reference. Say so. Status changes how readers interpret risk.</p>\n<p>If the project is not ready for production use, write that directly. If the public API is unstable, say what may change. If the repository is no longer maintained, add a short note near the top and point to the replacement if one exists.</p>\n<p>Do not hide status in release notes. The README is where new readers arrive.</p>\n<h2>Avoid badge clutter</h2>\n<p>Badges can show useful machine status, such as build, package, licence, or coverage state. They should not become a visual header that delays the actual explanation.</p>\n<p>Use a badge only when a reader can act on it or trust it as live status. Remove stale badges. A failing badge that nobody owns is not transparency. It is decay.</p>\n<h2>Maintain the README as code changes</h2>\n<p>A README becomes misleading when it is treated as a launch artefact. Update it in the same pull request as changes to behaviour, commands, public APIs, configuration, or directory structure.</p>\n<p>The best review question is simple: would a newcomer succeed if they followed this after the change?</p>\n<h2>Conclusion</h2>\n<p>A readable README is direct, current, and operational. Start with purpose, give the shortest successful path, map the repository at a high level, separate usage from development, and link to deeper documentation only when the reader needs it. The result is not more prose. It is less uncertainty.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "git",
        "devops"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/writing-a-runbook-your-team-will-use",
      "url": "https://soulstack.co.uk/blog/writing-a-runbook-your-team-will-use",
      "title": "Writing a runbook your team will use",
      "summary": "A runbook is useful only if an engineer can follow it under pressure. It should reduce thinking during an incident, not become another document to interpret.",
      "content_html": "<p>A runbook is useful only if an engineer can follow it under pressure. It should reduce thinking during an incident, not become another document to interpret.</p>\n<h2>Write for the on-call moment</h2>\n<p>The reader may be tired, interrupted, unfamiliar with the service, or responding to several alerts at once. Write for that situation.</p>\n<p>Use direct instructions. Put the safest first action near the top. Avoid long background sections before the responder can do anything useful. A runbook is not a design document. Link to design documents for context, but keep the incident path short.</p>\n<p>The goal is to help the reader decide whether the alert is real, assess impact, mitigate the problem, escalate when needed, and record what happened.</p>\n<h2>Start with scope and ownership</h2>\n<p>Every runbook needs a clear scope. State which service, alert, symptom, or operational task it covers. State who owns the service and where to escalate.</p>\n<p>Include:</p>\n<ul>\n<li>service name</li>\n<li>owning team</li>\n<li>escalation channel</li>\n<li>dashboard link</li>\n<li>alert link or alert name</li>\n<li>primary user impact</li>\n<li>safe rollback or mitigation owner</li>\n</ul>\n<p>Do not assume the reader knows the service. During incidents, support often crosses team boundaries.</p>\n<h2>Put the first five minutes first</h2>\n<p>The opening section should help the responder stabilise the situation.</p>\n<p>Include checks for:</p>\n<ul>\n<li>whether the alert is still firing</li>\n<li>current user impact</li>\n<li>recent deployments or configuration changes</li>\n<li>dependency status</li>\n<li>known maintenance or planned work</li>\n<li>whether the incident needs escalation</li>\n</ul>\n<p>Give commands or links that answer those questions quickly. Avoid making the reader construct queries from memory.</p>\n<h2>Separate diagnosis from mitigation</h2>\n<p>Diagnosis explains what is happening. Mitigation reduces impact. They are related, but they are not the same.</p>\n<p>Make mitigation steps explicit and reversible where possible. Label risky actions. State expected outcomes and how long to wait before moving to the next step.</p>\n<p>For example, a runbook can say that scaling workers may reduce queue delay, but it should also say which metric should improve and what limit should not be exceeded.</p>\n<h2>Make commands safe to copy</h2>\n<p>Commands in a runbook should be complete, current, and safe. Include placeholders only when they are obvious and named clearly.</p>\n<p>Good commands have:</p>\n<ul>\n<li>the correct tool name</li>\n<li>the correct environment or namespace</li>\n<li>the exact resource type</li>\n<li>a read only form before a write form</li>\n<li>expected output or success criteria</li>\n</ul>\n<p>Avoid destructive commands unless the runbook explains the consequence, approval path, and rollback.</p>\n<h2>Include decision points</h2>\n<p>A runbook should not be a blind checklist. It should make decisions easier.</p>\n<p>Use simple branches:</p>\n<ul>\n<li>If error rate is still rising, escalate to the incident lead.</li>\n<li>If only one region is affected, drain traffic from that region.</li>\n<li>If the last deployment changed the affected component, start rollback.</li>\n<li>If customer data may be affected, involve security and support.</li>\n</ul>\n<p>Keep branches short. If the decision tree becomes deep, split the runbook into smaller runbooks.</p>\n<h2>Keep links operational</h2>\n<p>Links should point to the exact dashboard, alert, deployment, log query, trace query, repository, or service page. A link to a generic homepage is not operational documentation.</p>\n<p>Use stable names and avoid private bookmarks. If a link requires access, state the required group or role. Review links after tool migrations and service renames.</p>\n<h2>Test the runbook before an incident</h2>\n<p>A runbook that has never been tested is a guess. Test it during onboarding, game days, readiness reviews, and after material service changes.</p>\n<p>A useful test asks a responder who did not write the runbook to follow it in a safe environment. Watch where they pause, search elsewhere, or ask for help. Those pauses are defects in the runbook.</p>\n<p>After each real incident, update the runbook while the details are still fresh.</p>\n<h2>Keep maintenance owned</h2>\n<p>Runbooks decay when ownership is unclear. Assign an owner and a review cadence. Review after changes to alerts, dashboards, deployment tooling, infrastructure, dependencies, and escalation paths.</p>\n<p>Stale runbooks are dangerous because they look authoritative. If a runbook is known to be incomplete, mark it clearly and fix it before relying on it for on-call coverage.</p>\n<h2>A simple runbook template</h2>\n<p>Use this structure for most operational runbooks.</p>\n<h3>Scope</h3>\n<p>State what this runbook covers and what it does not cover.</p>\n<h3>Impact</h3>\n<p>Describe the likely user impact and how to confirm it.</p>\n<h3>First checks</h3>\n<p>List the fastest checks for alert status, customer impact, recent change, and dependency health.</p>\n<h3>Mitigation</h3>\n<p>List safe actions in order. Include expected outcomes and rollback notes.</p>\n<h3>Diagnosis</h3>\n<p>List deeper checks, useful queries, and known failure modes.</p>\n<h3>Escalation</h3>\n<p>State who to contact, when to contact them, and what information to include.</p>\n<h3>Aftercare</h3>\n<p>State what to record, which issues to create, and which documents to update.</p>\n<h2>Conclusion</h2>\n<p>A good runbook is short, tested, owned, and operational. It gives the responder the first safe actions, clear decision points, exact links, and escalation rules. If the team does not use it during incidents, treat that as a documentation bug and fix it.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "reliability",
        "devops"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/writing-good-commit-messages",
      "url": "https://soulstack.co.uk/blog/writing-good-commit-messages",
      "title": "Writing good commit messages",
      "summary": "A good commit message explains a change quickly, accurately, and in a way that stays useful after the pull request has been closed. It should help a future reader understand what…",
      "content_html": "<p>A good commit message explains a change quickly, accurately, and in a way that stays useful after the pull request has been closed. It should help a future reader understand what changed and why.</p>\n<h2>Write for the future reader</h2>\n<p>The future reader may be investigating a regression, reviewing a release, auditing a security fix, or trying to understand why a line of code exists. They need context, not a diary of the work session.</p>\n<p>A weak message describes the activity:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> commit</span><span style=\"color:#79C0FF\"> -m</span><span style=\"color:#A5D6FF\"> \"Update files\"</span></span></code></pre></div><p>A stronger message describes the result:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> commit</span><span style=\"color:#79C0FF\"> -m</span><span style=\"color:#A5D6FF\"> \"Validate refresh tokens before issuing sessions\"</span></span></code></pre></div><p>The second message gives the reader a useful index entry in the project history.</p>\n<h2>Keep each commit focused</h2>\n<p>A clear message starts with a clear commit. Do not mix unrelated changes in the same commit. A bug fix, a rename, and a formatting sweep should usually be separate commits unless they are inseparable.</p>\n<p>Before committing, inspect the diff:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> diff</span><span style=\"color:#79C0FF\"> --check</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> diff</span><span style=\"color:#79C0FF\"> --staged</span></span></code></pre></div><p>The first command warns about whitespace errors and conflict markers. By default it flags trailing whitespace and a space followed by a tab inside the initial indent. The second command shows exactly what is staged. Use both to catch accidental edits before they become part of history.</p>\n<h2>Use a direct subject line</h2>\n<p>Use a short subject line that states the change. Start with a verb in the imperative mood when that reads naturally:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> commit</span><span style=\"color:#79C0FF\"> -m</span><span style=\"color:#A5D6FF\"> \"Reject expired password reset links\"</span></span></code></pre></div><p>That style matches the way Git itself describes commits when it generates messages, such as &quot;Merge branch ...&quot; or &quot;Revert ...&quot;. The official guidance in Git&#39;s SubmittingPatches document recommends the imperative mood, as if you are giving orders to the codebase to change its behaviour. It also reads well in a list: &quot;This commit will reject expired password reset links&quot;.</p>\n<p>Avoid vague subjects such as:</p>\n<ul>\n<li>&quot;Fix bug&quot;</li>\n<li>&quot;More changes&quot;</li>\n<li>&quot;WIP&quot;</li>\n<li>&quot;Address review comments&quot;</li>\n<li>&quot;Final update&quot;</li>\n</ul>\n<p>Those messages force the reader to open the diff before they can begin to understand the purpose.</p>\n<h2>Add a body when the subject is not enough</h2>\n<p>A one line message is fine for a small, obvious change. Use a body when the change has context, trade offs, migration steps, or behaviour that is not obvious from the diff.</p>\n<p>You can supply a body with a second <code>-m</code> option. Git concatenates multiple <code>-m</code> values as separate paragraphs, so the subject and body are split by a blank line automatically:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> commit</span><span style=\"color:#79C0FF\"> -m</span><span style=\"color:#A5D6FF\"> \"Limit login attempts per account\"</span><span style=\"color:#79C0FF\"> -m</span><span style=\"color:#A5D6FF\"> \"Track failed attempts against the account identifier rather than the source IP address. This avoids locking out shared networks while still slowing repeated credential attacks.\"</span></span></code></pre></div><p>The body should explain why the change exists and any important constraints. It should not repeat every line of the diff.</p>\n<h2>Mention user visible behaviour</h2>\n<p>When a change affects behaviour, say what changes for the user, operator, or caller. That helps release notes, support investigation, and incident review.</p>\n<p>For example:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> commit</span><span style=\"color:#79C0FF\"> -m</span><span style=\"color:#A5D6FF\"> \"Return 404 for deleted invoice exports\"</span><span style=\"color:#79C0FF\"> -m</span><span style=\"color:#A5D6FF\"> \"Deleted exports no longer return an empty CSV. The API now reports that the export is unavailable, which matches the documented lifecycle.\"</span></span></code></pre></div><p>This message gives the reader the practical effect and the reason.</p>\n<h2>Use structured messages only when they add value</h2>\n<p>Some teams use a structured format such as Conventional Commits. That format can be useful when tooling generates changelogs or infers semantic version changes. Under that convention a <code>fix</code> commit maps to a patch release and a <code>feat</code> commit maps to a minor release, while a breaking change marker maps to a major release.</p>\n<p>A structured subject may look like this:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> commit</span><span style=\"color:#79C0FF\"> -m</span><span style=\"color:#A5D6FF\"> \"fix(auth): reject expired password reset links\"</span></span></code></pre></div><p>Use structure consistently or do not use it. A half adopted convention creates noise because readers and tools cannot rely on it.</p>\n<p>Do not let the type replace the explanation. The message still needs to say what changed, and for non-trivial commits the body still needs to explain why.</p>\n<h2>Avoid leaking private context</h2>\n<p>A commit message is part of the repository history. It may be copied into mirrors, release notes, package metadata, or audit exports.</p>\n<p>Do not include:</p>\n<ul>\n<li>personal names unless the project explicitly requires attribution there</li>\n<li>client names</li>\n<li>private URLs</li>\n<li>incident details that do not belong in source control</li>\n<li>secrets, tokens, keys, or credentials</li>\n<li>internal system names that are not already public in the repository</li>\n</ul>\n<p>Keep the message about the technical change.</p>\n<h2>Fix the message before sharing it</h2>\n<p>If the commit has not been pushed or shared, amend the message when it is wrong:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> commit</span><span style=\"color:#79C0FF\"> --amend</span></span></code></pre></div><p>This replaces the tip of the current branch with a new commit and reuses the original message as the starting point unless you supply a new one. If several local commits need cleanup, use interactive rebase:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">git</span><span style=\"color:#A5D6FF\"> rebase</span><span style=\"color:#79C0FF\"> -i</span><span style=\"color:#A5D6FF\"> HEAD~3</span></span></code></pre></div><p>Only rewrite commits that are still private, or where the team has explicitly agreed that rewriting the branch is safe. Once others may have based work on a commit, prefer a new corrective commit.</p>\n<h2>Use pull request text for review context</h2>\n<p>A commit message and a pull request description do different jobs. The commit message must stand alone in Git history. The pull request description can carry the review plan, screenshots, test evidence, rollout notes, and links to the issue tracker.</p>\n<p>Do not rely on the pull request alone for information that is essential to understanding the committed change. Pull request systems can change, but Git history remains the durable record.</p>\n<h2>Conclusion</h2>\n<p>Good commit messages are specific, verifiable, and useful after the immediate review is over. Keep commits focused, write a direct subject, add a body when the reason matters, and remove private context. A clear history makes debugging, review, and maintenance faster.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "git",
        "cli",
        "devops"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/writing-safe-readable-bash-scripts",
      "url": "https://soulstack.co.uk/blog/writing-safe-readable-bash-scripts",
      "title": "Writing safe, readable Bash scripts",
      "summary": "Bash is useful for glue code, local automation and small operational tasks. It becomes risky when scripts hide failures, split data accidentally or depend on interactive shell hab…",
      "content_html": "<p>Bash is useful for glue code, local automation and small operational tasks. It becomes risky when scripts hide failures, split data accidentally or depend on interactive shell habits. Safe Bash is explicit about its inputs, exits and quoting.</p>\n<h2>Start with the interpreter and shell options</h2>\n<p>Use an interpreter line that matches the script you are writing. If the script uses Bash arrays, <code>[[ ... ]]</code>, process substitution or <code>pipefail</code>, it is a Bash script and should not claim to be portable <code>sh</code>.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#8B949E\">#!/usr/bin/env bash</span></span>\n<span class=\"line\"><span style=\"color:#79C0FF\">set</span><span style=\"color:#79C0FF\"> -euo</span><span style=\"color:#A5D6FF\"> pipefail</span></span></code></pre></div><p><code>set -e</code> exits after many unhandled command failures. <code>set -u</code> treats unset variables as errors. <code>set -o pipefail</code> makes a pipeline fail when any command in the pipeline fails. These options are useful, but they are not a substitute for clear error handling. Bash documents exceptions for <code>errexit</code>, including commands used in the test of a conditional and parts of <code>&amp;&amp;</code> or <code>||</code> lists.</p>\n<p>Use explicit checks where failure is expected.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FF7B72\">if</span><span style=\"color:#FF7B72\"> !</span><span style=\"color:#FFA657\"> grep</span><span style=\"color:#79C0FF\"> -q</span><span style=\"color:#A5D6FF\"> \"ready\"</span><span style=\"color:#A5D6FF\"> status.txt</span><span style=\"color:#E6EDF3\">; </span><span style=\"color:#FF7B72\">then</span></span>\n<span class=\"line\"><span style=\"color:#79C0FF\">  echo</span><span style=\"color:#A5D6FF\"> \"status.txt does not contain ready\"</span><span style=\"color:#FF7B72\"> >&#x26;2</span></span>\n<span class=\"line\"><span style=\"color:#79C0FF\">  exit</span><span style=\"color:#79C0FF\"> 1</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">fi</span></span></code></pre></div><h2>Quote variables by default</h2>\n<p>Unquoted parameter expansion can trigger word splitting and pathname expansion. That can turn one value into many arguments, or match files in the current directory. Quote variables unless you intentionally want splitting.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#E6EDF3\">src</span><span style=\"color:#FF7B72\">=</span><span style=\"color:#A5D6FF\">\"</span><span style=\"color:#79C0FF\">$1</span><span style=\"color:#A5D6FF\">\"</span></span>\n<span class=\"line\"><span style=\"color:#E6EDF3\">dest</span><span style=\"color:#FF7B72\">=</span><span style=\"color:#A5D6FF\">\"</span><span style=\"color:#79C0FF\">$2</span><span style=\"color:#A5D6FF\">\"</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#FFA657\">cp</span><span style=\"color:#79C0FF\"> --</span><span style=\"color:#A5D6FF\"> \"</span><span style=\"color:#E6EDF3\">$src</span><span style=\"color:#A5D6FF\">\"</span><span style=\"color:#A5D6FF\"> \"</span><span style=\"color:#E6EDF3\">$dest</span><span style=\"color:#A5D6FF\">\"</span></span></code></pre></div><p>Use arrays when you need to build a command with optional arguments.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#E6EDF3\">args</span><span style=\"color:#FF7B72\">=</span><span style=\"color:#E6EDF3\">(</span><span style=\"color:#A5D6FF\">--recursive</span><span style=\"color:#E6EDF3\">)</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">if</span><span style=\"color:#E6EDF3\"> [ </span><span style=\"color:#A5D6FF\">\"${</span><span style=\"color:#E6EDF3\">verbose</span><span style=\"color:#FF7B72\">:-</span><span style=\"color:#A5D6FF\">}\"</span><span style=\"color:#FF7B72\"> =</span><span style=\"color:#A5D6FF\"> \"1\"</span><span style=\"color:#E6EDF3\"> ]; </span><span style=\"color:#FF7B72\">then</span></span>\n<span class=\"line\"><span style=\"color:#E6EDF3\">  args</span><span style=\"color:#FF7B72\">+=</span><span style=\"color:#E6EDF3\">(</span><span style=\"color:#A5D6FF\">--verbose</span><span style=\"color:#E6EDF3\">)</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">fi</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#FFA657\">cp</span><span style=\"color:#A5D6FF\"> \"${</span><span style=\"color:#E6EDF3\">args</span><span style=\"color:#A5D6FF\">[</span><span style=\"color:#FF7B72\">@</span><span style=\"color:#A5D6FF\">]}\"</span><span style=\"color:#79C0FF\"> --</span><span style=\"color:#A5D6FF\"> \"</span><span style=\"color:#E6EDF3\">$src</span><span style=\"color:#A5D6FF\">\"</span><span style=\"color:#A5D6FF\"> \"</span><span style=\"color:#E6EDF3\">$dest</span><span style=\"color:#A5D6FF\">\"</span></span></code></pre></div><p>Do not store a command line in a string and then run it. Store arguments as an array and execute the command directly.</p>\n<h2>Treat input as data, not code</h2>\n<p>Use <code>read -r</code> so backslashes are read literally. Set <code>IFS=</code> for line-oriented reads where leading and trailing whitespace matters.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FF7B72\">while</span><span style=\"color:#E6EDF3\"> IFS</span><span style=\"color:#FF7B72\">=</span><span style=\"color:#79C0FF\"> read</span><span style=\"color:#79C0FF\"> -r</span><span style=\"color:#A5D6FF\"> line</span><span style=\"color:#E6EDF3\">; </span><span style=\"color:#FF7B72\">do</span></span>\n<span class=\"line\"><span style=\"color:#79C0FF\">  printf</span><span style=\"color:#A5D6FF\"> '%s\\n'</span><span style=\"color:#A5D6FF\"> \"</span><span style=\"color:#E6EDF3\">$line</span><span style=\"color:#A5D6FF\">\"</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">done</span><span style=\"color:#FF7B72\"> &#x3C;</span><span style=\"color:#E6EDF3\"> input.txt</span></span></code></pre></div><p>Use <code>printf</code> instead of <code>echo</code> for data. <code>echo</code> has portability and option parsing edge cases. <code>printf</code> gives clear, predictable formatting.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#79C0FF\">printf</span><span style=\"color:#A5D6FF\"> '%s\\n'</span><span style=\"color:#A5D6FF\"> \"</span><span style=\"color:#E6EDF3\">$message</span><span style=\"color:#A5D6FF\">\"</span></span></code></pre></div><p>When reading paths from another command, avoid whitespace-delimited loops. Prefer null-delimited output and input where available.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">find</span><span style=\"color:#A5D6FF\"> .</span><span style=\"color:#79C0FF\"> -type</span><span style=\"color:#A5D6FF\"> f</span><span style=\"color:#79C0FF\"> -name</span><span style=\"color:#A5D6FF\"> '*.log'</span><span style=\"color:#79C0FF\"> -print0</span><span style=\"color:#FF7B72\"> |</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">  while</span><span style=\"color:#A5D6FF\"> IFS</span><span style=\"color:#FF7B72\">=</span><span style=\"color:#A5D6FF\"> read</span><span style=\"color:#79C0FF\"> -r</span><span style=\"color:#79C0FF\"> -d</span><span style=\"color:#A5D6FF\"> ''</span><span style=\"color:#A5D6FF\"> path</span><span style=\"color:#E6EDF3\">; </span><span style=\"color:#FF7B72\">do</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">    gzip</span><span style=\"color:#79C0FF\"> --</span><span style=\"color:#A5D6FF\"> \"</span><span style=\"color:#E6EDF3\">$path</span><span style=\"color:#A5D6FF\">\"</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">  done</span></span></code></pre></div><h2>Make failures visible</h2>\n<p>Print errors to standard error and exit with a non-zero status.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#D2A8FF\">fail</span><span style=\"color:#E6EDF3\">() {</span></span>\n<span class=\"line\"><span style=\"color:#79C0FF\">  printf</span><span style=\"color:#A5D6FF\"> 'error: %s\\n'</span><span style=\"color:#A5D6FF\"> \"</span><span style=\"color:#79C0FF\">$*</span><span style=\"color:#A5D6FF\">\"</span><span style=\"color:#FF7B72\"> >&#x26;2</span></span>\n<span class=\"line\"><span style=\"color:#79C0FF\">  exit</span><span style=\"color:#79C0FF\"> 1</span></span>\n<span class=\"line\"><span style=\"color:#E6EDF3\">}</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#E6EDF3\">[ </span><span style=\"color:#A5D6FF\">\"</span><span style=\"color:#79C0FF\">$#</span><span style=\"color:#A5D6FF\">\"</span><span style=\"color:#FF7B72\"> -eq</span><span style=\"color:#79C0FF\"> 2</span><span style=\"color:#E6EDF3\"> ] </span><span style=\"color:#FF7B72\">||</span><span style=\"color:#FFA657\"> fail</span><span style=\"color:#A5D6FF\"> \"usage: deploy SOURCE DEST\"</span></span></code></pre></div><p>Use a trap for cleanup. Keep cleanup idempotent, because it may run after partial failure.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#E6EDF3\">tmpdir</span><span style=\"color:#FF7B72\">=</span><span style=\"color:#A5D6FF\">\"$(</span><span style=\"color:#FFA657\">mktemp</span><span style=\"color:#79C0FF\"> -d</span><span style=\"color:#A5D6FF\">)\"</span></span>\n<span class=\"line\"><span style=\"color:#D2A8FF\">cleanup</span><span style=\"color:#E6EDF3\">() {</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">  rm</span><span style=\"color:#79C0FF\"> -rf</span><span style=\"color:#79C0FF\"> --</span><span style=\"color:#A5D6FF\"> \"</span><span style=\"color:#E6EDF3\">$tmpdir</span><span style=\"color:#A5D6FF\">\"</span></span>\n<span class=\"line\"><span style=\"color:#E6EDF3\">}</span></span>\n<span class=\"line\"><span style=\"color:#79C0FF\">trap</span><span style=\"color:#A5D6FF\"> cleanup</span><span style=\"color:#A5D6FF\"> EXIT</span></span></code></pre></div><p>Use <code>trap</code> for cleanup, not for hiding errors. If a script has several important steps, print the step before running it or wrap it in a small function with a clear name.</p>\n<h2>Keep scripts small and reviewable</h2>\n<p>Put constants near the top. Put reusable operations in functions. Prefer local variables inside functions.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#D2A8FF\">copy_assets</span><span style=\"color:#E6EDF3\">() {</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">  local</span><span style=\"color:#E6EDF3\"> src_dir</span><span style=\"color:#FF7B72\">=</span><span style=\"color:#A5D6FF\">\"</span><span style=\"color:#79C0FF\">$1</span><span style=\"color:#A5D6FF\">\"</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">  local</span><span style=\"color:#E6EDF3\"> dest_dir</span><span style=\"color:#FF7B72\">=</span><span style=\"color:#A5D6FF\">\"</span><span style=\"color:#79C0FF\">$2</span><span style=\"color:#A5D6FF\">\"</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#FFA657\">  mkdir</span><span style=\"color:#79C0FF\"> -p</span><span style=\"color:#79C0FF\"> --</span><span style=\"color:#A5D6FF\"> \"</span><span style=\"color:#E6EDF3\">$dest_dir</span><span style=\"color:#A5D6FF\">\"</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">  cp</span><span style=\"color:#79C0FF\"> -R</span><span style=\"color:#79C0FF\"> --</span><span style=\"color:#A5D6FF\"> \"</span><span style=\"color:#E6EDF3\">$src_dir</span><span style=\"color:#A5D6FF\">\"/.</span><span style=\"color:#A5D6FF\"> \"</span><span style=\"color:#E6EDF3\">$dest_dir</span><span style=\"color:#A5D6FF\">\"/</span></span>\n<span class=\"line\"><span style=\"color:#E6EDF3\">}</span></span></code></pre></div><p>Avoid clever one-liners in scripts. A pipeline that is easy to paste into a terminal can be hard to debug in CI. Split complex operations into named steps and intermediate files when that improves reviewability.</p>\n<h2>Validate arguments and environment</h2>\n<p>Check required commands before using them.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#79C0FF\">command</span><span style=\"color:#79C0FF\"> -v</span><span style=\"color:#A5D6FF\"> git</span><span style=\"color:#FF7B72\"> ></span><span style=\"color:#A5D6FF\">/dev/null</span><span style=\"color:#FF7B72\"> 2>&#x26;1</span><span style=\"color:#FF7B72\"> ||</span><span style=\"color:#FFA657\"> fail</span><span style=\"color:#A5D6FF\"> \"git is required\"</span></span>\n<span class=\"line\"><span style=\"color:#79C0FF\">command</span><span style=\"color:#79C0FF\"> -v</span><span style=\"color:#A5D6FF\"> jq</span><span style=\"color:#FF7B72\"> ></span><span style=\"color:#A5D6FF\">/dev/null</span><span style=\"color:#FF7B72\"> 2>&#x26;1</span><span style=\"color:#FF7B72\"> ||</span><span style=\"color:#FFA657\"> fail</span><span style=\"color:#A5D6FF\"> \"jq is required\"</span></span></code></pre></div><p>Validate files and directories before destructive operations.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#E6EDF3\">[ </span><span style=\"color:#FF7B72\">-d</span><span style=\"color:#A5D6FF\"> \"</span><span style=\"color:#E6EDF3\">$build_dir</span><span style=\"color:#A5D6FF\">\"</span><span style=\"color:#E6EDF3\"> ] </span><span style=\"color:#FF7B72\">||</span><span style=\"color:#FFA657\"> fail</span><span style=\"color:#A5D6FF\"> \"build directory does not exist: </span><span style=\"color:#E6EDF3\">$build_dir</span><span style=\"color:#A5D6FF\">\"</span></span>\n<span class=\"line\"><span style=\"color:#E6EDF3\">[ </span><span style=\"color:#A5D6FF\">\"</span><span style=\"color:#E6EDF3\">$build_dir</span><span style=\"color:#A5D6FF\">\"</span><span style=\"color:#FF7B72\"> !=</span><span style=\"color:#A5D6FF\"> \"/\"</span><span style=\"color:#E6EDF3\"> ] </span><span style=\"color:#FF7B72\">||</span><span style=\"color:#FFA657\"> fail</span><span style=\"color:#A5D6FF\"> \"refusing to remove /\"</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">rm</span><span style=\"color:#79C0FF\"> -rf</span><span style=\"color:#79C0FF\"> --</span><span style=\"color:#A5D6FF\"> \"</span><span style=\"color:#E6EDF3\">$build_dir</span><span style=\"color:#A5D6FF\">\"</span></span></code></pre></div><p>Use parameter expansion for defaults only when the default is intentional.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#E6EDF3\">profile</span><span style=\"color:#FF7B72\">=</span><span style=\"color:#A5D6FF\">\"${</span><span style=\"color:#E6EDF3\">PROFILE</span><span style=\"color:#FF7B72\">:-</span><span style=\"color:#E6EDF3\">dev</span><span style=\"color:#A5D6FF\">}\"</span></span></code></pre></div><h2>Lint and test shell scripts</h2>\n<p>Run a syntax check before execution.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">bash</span><span style=\"color:#79C0FF\"> -n</span><span style=\"color:#A5D6FF\"> scripts/deploy.sh</span></span></code></pre></div><p>Run ShellCheck and fix warnings unless there is a documented reason not to. ShellCheck catches common quoting, expansion and portability issues, including unquoted variables that may split or glob.</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">shellcheck</span><span style=\"color:#A5D6FF\"> scripts/deploy.sh</span></span></code></pre></div><p>Test scripts with filenames that contain spaces, empty inputs, missing commands and failing subprocesses. Most shell bugs appear at boundaries, not in the happy path.</p>\n<h2>Know when not to use Bash</h2>\n<p>Use Bash for orchestration. Do not use it for complex data structures, long-lived services or heavy parsing of JSON, XML or YAML. Call a real parser or write a small program in a language with structured data types.</p>\n<p>A Bash script is at its best when it coordinates existing tools, checks their exit statuses and leaves a readable audit trail.</p>\n<h2>Conclusion</h2>\n<p>Safe Bash is mostly discipline. Use Bash features only when the script is declared as Bash, quote data, prefer arrays over command strings, make failure explicit and lint every script. The result is not fancy, but it is much easier to trust during a release or incident.</p>\n",
      "date_published": "2026-06-04T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "cli",
        "devops",
        "reliability"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/typescript-discriminated-unions",
      "url": "https://soulstack.co.uk/blog/typescript-discriminated-unions",
      "title": "Modelling state with discriminated unions in TypeScript",
      "summary": "A surprising number of bugs come from states that should never coexist. A request that is both loading and errored. A form that is submitting and has a result. Discriminated union…",
      "content_html": "<p>A surprising number of bugs come from states that should never coexist. A request that is both loading and errored. A form that is submitting and has a result. Discriminated unions let you describe state so that the impossible combinations do not type check.</p>\n<h2>The problem with optional fields</h2>\n<p>Here is the shape a lot of code starts with:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FF7B72\">interface</span><span style=\"color:#FFA657\"> RequestState</span><span style=\"color:#E6EDF3\"> {</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">  loading</span><span style=\"color:#FF7B72\">:</span><span style=\"color:#79C0FF\"> boolean</span><span style=\"color:#E6EDF3\">;</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">  data</span><span style=\"color:#FF7B72\">?:</span><span style=\"color:#FFA657\"> User</span><span style=\"color:#E6EDF3\">;</span></span>\n<span class=\"line\"><span style=\"color:#FFA657\">  error</span><span style=\"color:#FF7B72\">?:</span><span style=\"color:#79C0FF\"> string</span><span style=\"color:#E6EDF3\">;</span></span>\n<span class=\"line\"><span style=\"color:#E6EDF3\">}</span></span></code></pre></div><p>Nothing stops <code>loading</code> being true while <code>data</code> is also set, and every read of <code>data</code> needs a guard. The type permits states the program never intends to reach.</p>\n<h2>One tag to rule them</h2>\n<p>Give each state a literal <code>status</code> field and let the others depend on it:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FF7B72\">type</span><span style=\"color:#FFA657\"> RequestState</span><span style=\"color:#FF7B72\"> =</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">  |</span><span style=\"color:#E6EDF3\"> { </span><span style=\"color:#FFA657\">status</span><span style=\"color:#FF7B72\">:</span><span style=\"color:#A5D6FF\"> 'idle'</span><span style=\"color:#E6EDF3\"> }</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">  |</span><span style=\"color:#E6EDF3\"> { </span><span style=\"color:#FFA657\">status</span><span style=\"color:#FF7B72\">:</span><span style=\"color:#A5D6FF\"> 'loading'</span><span style=\"color:#E6EDF3\"> }</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">  |</span><span style=\"color:#E6EDF3\"> { </span><span style=\"color:#FFA657\">status</span><span style=\"color:#FF7B72\">:</span><span style=\"color:#A5D6FF\"> 'success'</span><span style=\"color:#E6EDF3\">; </span><span style=\"color:#FFA657\">data</span><span style=\"color:#FF7B72\">:</span><span style=\"color:#FFA657\"> User</span><span style=\"color:#E6EDF3\"> }</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">  |</span><span style=\"color:#E6EDF3\"> { </span><span style=\"color:#FFA657\">status</span><span style=\"color:#FF7B72\">:</span><span style=\"color:#A5D6FF\"> 'error'</span><span style=\"color:#E6EDF3\">; </span><span style=\"color:#FFA657\">message</span><span style=\"color:#FF7B72\">:</span><span style=\"color:#79C0FF\"> string</span><span style=\"color:#E6EDF3\"> };</span></span></code></pre></div><p>Now <code>data</code> only exists when <code>status</code> is <code>success</code>, and the compiler knows it. Narrowing on the tag unlocks the right fields:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FF7B72\">function</span><span style=\"color:#D2A8FF\"> render</span><span style=\"color:#E6EDF3\">(</span><span style=\"color:#FFA657\">state</span><span style=\"color:#FF7B72\">:</span><span style=\"color:#FFA657\"> RequestState</span><span style=\"color:#E6EDF3\">)</span><span style=\"color:#FF7B72\">:</span><span style=\"color:#79C0FF\"> string</span><span style=\"color:#E6EDF3\"> {</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">  switch</span><span style=\"color:#E6EDF3\"> (state.status) {</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">    case</span><span style=\"color:#A5D6FF\"> 'idle'</span><span style=\"color:#E6EDF3\">:</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">      return</span><span style=\"color:#A5D6FF\"> 'Nothing requested yet'</span><span style=\"color:#E6EDF3\">;</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">    case</span><span style=\"color:#A5D6FF\"> 'loading'</span><span style=\"color:#E6EDF3\">:</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">      return</span><span style=\"color:#A5D6FF\"> 'Loading...'</span><span style=\"color:#E6EDF3\">;</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">    case</span><span style=\"color:#A5D6FF\"> 'success'</span><span style=\"color:#E6EDF3\">:</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">      return</span><span style=\"color:#E6EDF3\"> state.data.name;</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">    case</span><span style=\"color:#A5D6FF\"> 'error'</span><span style=\"color:#E6EDF3\">:</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">      return</span><span style=\"color:#E6EDF3\"> state.message;</span></span>\n<span class=\"line\"><span style=\"color:#E6EDF3\">  }</span></span>\n<span class=\"line\"><span style=\"color:#E6EDF3\">}</span></span></code></pre></div><h2>Make the compiler check completeness</h2>\n<p>Add an exhaustiveness guard so a new state cannot be forgotten:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FF7B72\">function</span><span style=\"color:#D2A8FF\"> assertNever</span><span style=\"color:#E6EDF3\">(</span><span style=\"color:#FFA657\">value</span><span style=\"color:#FF7B72\">:</span><span style=\"color:#79C0FF\"> never</span><span style=\"color:#E6EDF3\">)</span><span style=\"color:#FF7B72\">:</span><span style=\"color:#79C0FF\"> never</span><span style=\"color:#E6EDF3\"> {</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">  throw</span><span style=\"color:#FF7B72\"> new</span><span style=\"color:#D2A8FF\"> Error</span><span style=\"color:#E6EDF3\">(</span><span style=\"color:#A5D6FF\">`Unhandled state: ${</span><span style=\"color:#79C0FF\">JSON</span><span style=\"color:#A5D6FF\">.</span><span style=\"color:#D2A8FF\">stringify</span><span style=\"color:#A5D6FF\">(</span><span style=\"color:#E6EDF3\">value</span><span style=\"color:#A5D6FF\">)</span><span style=\"color:#A5D6FF\">}`</span><span style=\"color:#E6EDF3\">);</span></span>\n<span class=\"line\"><span style=\"color:#E6EDF3\">}</span></span></code></pre></div><p>Call it in the <code>default</code> branch of the switch. The day someone adds a <code>cancelled</code> status, every switch that forgot to handle it becomes a compile error rather than a silent gap.</p>\n<h2>When to reach for it</h2>\n<p>Discriminated unions pay off whenever a value moves through distinct phases: request state, parser results, view models, message protocols. If you find yourself writing comments about which fields are valid together, that is the signal to encode it in the type instead.</p>\n",
      "date_published": "2026-06-03T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "typescript",
        "engineering"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/building-a-static-blog-with-vite-ssg",
      "url": "https://soulstack.co.uk/blog/building-a-static-blog-with-vite-ssg",
      "title": "Building a static blog with vite-ssg",
      "summary": "This site is a Vue single page app, but every route is also pre-rendered to static HTML at build time using vite-ssg. The blog rides on the same machinery, so each post ships as a…",
      "content_html": "<p>This site is a Vue single page app, but every route is also pre-rendered to static HTML at build time using vite-ssg. The blog rides on the same machinery, so each post ships as a real HTML document with its title, meta tags, and structured data already in place.</p>\n<h2>The shape of a post</h2>\n<p>A post is two files. The body is a markdown file with the title as its first heading:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#79C0FF;font-weight:bold\"># Building a static blog with vite-ssg</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#E6EDF3\">This site is a Vue single page app, but every route is also</span></span>\n<span class=\"line\"><span style=\"color:#E6EDF3\">pre-rendered to static HTML at build time using vite-ssg.</span></span></code></pre></div><p>The metadata sits in a typed registry keyed by slug:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FF7B72\">export</span><span style=\"color:#FF7B72\"> const</span><span style=\"color:#79C0FF\"> BLOG_POSTS</span><span style=\"color:#FF7B72\"> =</span><span style=\"color:#E6EDF3\"> {</span></span>\n<span class=\"line\"><span style=\"color:#A5D6FF\">  'building-a-static-blog-with-vite-ssg'</span><span style=\"color:#E6EDF3\">: {</span></span>\n<span class=\"line\"><span style=\"color:#E6EDF3\">    seoTitle: </span><span style=\"color:#A5D6FF\">'Building a static blog with vite-ssg'</span><span style=\"color:#E6EDF3\">,</span></span>\n<span class=\"line\"><span style=\"color:#E6EDF3\">    metaDescription: </span><span style=\"color:#A5D6FF\">'How this blog renders every post to static HTML.'</span><span style=\"color:#E6EDF3\">,</span></span>\n<span class=\"line\"><span style=\"color:#E6EDF3\">    datePublished: </span><span style=\"color:#A5D6FF\">'2026-06-02'</span><span style=\"color:#E6EDF3\">,</span></span>\n<span class=\"line\"><span style=\"color:#E6EDF3\">    tags: [</span><span style=\"color:#A5D6FF\">'vue'</span><span style=\"color:#E6EDF3\">, </span><span style=\"color:#A5D6FF\">'vite'</span><span style=\"color:#E6EDF3\">, </span><span style=\"color:#A5D6FF\">'seo'</span><span style=\"color:#E6EDF3\">],</span></span>\n<span class=\"line\"><span style=\"color:#E6EDF3\">  },</span></span>\n<span class=\"line\"><span style=\"color:#E6EDF3\">};</span></span></code></pre></div><h2>Rendering happens once, at build time</h2>\n<p>A small Node script reads the markdown, renders it to HTML with marked, and highlights code blocks with Shiki. The result is written to a generated JSON file that the app imports directly:</p>\n<div class=\"code-block\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FF7B72\">import</span><span style=\"color:#E6EDF3\"> generated </span><span style=\"color:#FF7B72\">from</span><span style=\"color:#A5D6FF\"> './posts.generated.json'</span><span style=\"color:#E6EDF3\">;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">export</span><span style=\"color:#FF7B72\"> const</span><span style=\"color:#79C0FF\"> blogPosts</span><span style=\"color:#FF7B72\"> =</span><span style=\"color:#E6EDF3\"> generated.posts;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">export</span><span style=\"color:#FF7B72\"> function</span><span style=\"color:#D2A8FF\"> getPostBySlug</span><span style=\"color:#E6EDF3\">(</span><span style=\"color:#FFA657\">slug</span><span style=\"color:#FF7B72\">:</span><span style=\"color:#79C0FF\"> string</span><span style=\"color:#E6EDF3\">) {</span></span>\n<span class=\"line\"><span style=\"color:#FF7B72\">  return</span><span style=\"color:#E6EDF3\"> blogPosts.</span><span style=\"color:#D2A8FF\">find</span><span style=\"color:#E6EDF3\">((</span><span style=\"color:#FFA657\">post</span><span style=\"color:#E6EDF3\">) </span><span style=\"color:#FF7B72\">=></span><span style=\"color:#E6EDF3\"> post.slug </span><span style=\"color:#FF7B72\">===</span><span style=\"color:#E6EDF3\"> slug);</span></span>\n<span class=\"line\"><span style=\"color:#E6EDF3\">}</span></span></code></pre></div><p>Because the highlighting runs in the build, neither marked nor Shiki ends up in the browser bundle. Readers get pre-styled HTML, and the client only hydrates it.</p>\n<h2>Count your posts</h2>\n<p>Every post is a markdown file in one directory, so checking how many you have is a one line command. Pick your shell.</p>\n<div class=\"code-group\" data-code-group><div class=\"code-group__tabs\" role=\"tablist\"><button class=\"code-group__tab is-active\" type=\"button\" role=\"tab\" aria-selected=\"true\" data-cg-tab=\"0\">Bash</button><button class=\"code-group__tab\" type=\"button\" role=\"tab\" aria-selected=\"false\" data-cg-tab=\"1\">Python</button><button class=\"code-group__tab\" type=\"button\" role=\"tab\" aria-selected=\"false\" data-cg-tab=\"2\">PowerShell</button></div><div class=\"code-group__panel is-active\" role=\"tabpanel\" data-cg-panel=\"0\"><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FFA657\">ls</span><span style=\"color:#A5D6FF\"> src/content/blog/posts/</span><span style=\"color:#79C0FF\">*</span><span style=\"color:#A5D6FF\">.md</span><span style=\"color:#FF7B72\"> |</span><span style=\"color:#FFA657\"> wc</span><span style=\"color:#79C0FF\"> -l</span></span></code></pre></div><div class=\"code-group__panel\" role=\"tabpanel\" data-cg-panel=\"1\" hidden><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#FF7B72\">import</span><span style=\"color:#E6EDF3\"> glob</span></span>\n<span class=\"line\"><span style=\"color:#79C0FF\">print</span><span style=\"color:#E6EDF3\">(</span><span style=\"color:#79C0FF\">len</span><span style=\"color:#E6EDF3\">(glob.glob(</span><span style=\"color:#A5D6FF\">\"src/content/blog/posts/*.md\"</span><span style=\"color:#E6EDF3\">)))</span></span></code></pre></div><div class=\"code-group__panel\" role=\"tabpanel\" data-cg-panel=\"2\" hidden><button class=\"code-copy\" type=\"button\" aria-label=\"Copy code\"><span class=\"code-copy__label\">Copy</span></button><pre class=\"shiki github-dark-default\" style=\"background-color:#0d1117;color:#e6edf3\" tabindex=\"0\"><code><span class=\"line\"><span>(Get-ChildItem src/content/blog/posts/*.md).Count</span></span></code></pre></div></div><h2>Why bother</h2>\n<p>Three reasons:</p>\n<ul>\n<li>Crawlers see the full article on first byte, which is the whole point of an SEO blog</li>\n<li>The page is fast because there is no content fetch after load</li>\n<li>The author workflow is just writing markdown and opening a pull request</li>\n</ul>\n<p>It is a small amount of build code in exchange for a blog that behaves like a static site and reads like a dynamic one.</p>\n",
      "date_published": "2026-06-02T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "vue",
        "vite",
        "seo",
        "engineering"
      ]
    },
    {
      "id": "https://soulstack.co.uk/blog/notes-from-the-workshop",
      "url": "https://soulstack.co.uk/blog/notes-from-the-workshop",
      "title": "Notes from the Soulstack workshop",
      "summary": "This is where we write about the software we build and the decisions behind it. There is no fixed theme and no schedule, just short technical notes when something is worth sharing.",
      "content_html": "<p>This is where we write about the software we build and the decisions behind it. There is no fixed theme and no schedule, just short technical notes when something is worth sharing.</p>\n<h2>Why a code first blog</h2>\n<p>Every post here is a markdown file in the site repository. The metadata lives next to it in a small typed registry, and the whole thing is validated and rendered at build time. That means posts are reviewed like code, version controlled like code, and shipped like code.</p>\n<p>It also means the blog has no database, no admin panel, and no runtime to keep alive. The output is plain static HTML.</p>\n<h2>What to expect</h2>\n<p>A few kinds of writing will show up here:</p>\n<ul>\n<li>Short writeups on tools and techniques we reach for</li>\n<li>The occasional deep dive when a problem earns it</li>\n<li>Honest accounts of what did not work and why</li>\n</ul>\n<p>That is the whole plan. Welcome, and thanks for reading.</p>\n",
      "date_published": "2026-06-01T00:00:00.000Z",
      "date_modified": "2026-06-04T00:00:00.000Z",
      "tags": [
        "engineering",
        "meta"
      ]
    }
  ]
}