Reading code you did not write
Reading unfamiliar code is a skill. The goal is not to understand every line on the first pass. The goal is to build a reliable map of behaviour, boundaries, and risk, so that whe…
Reading unfamiliar code is a skill. The goal is not to understand every line on the first pass. The goal is to build a reliable map of behaviour, boundaries, and risk, so that when you finally change something you change the right thing.
Start with the purpose
Before opening files at random, identify why the code exists. Read the README, package metadata, command names, tests, issue links, and recent pull request descriptions. These artefacts often explain intent better than the implementation does.
Look for the public surface first. For a library, find exported functions and documented examples. For an application, find routes, jobs, command handlers, or event consumers. For a build tool, find inputs, outputs, and the command that runs it.
Purpose gives you a filter. Without it, every helper looks equally important.
Find the entry points
Entry points tell you where execution begins. Common examples include:
- HTTP route registration.
- Command line command definitions.
- Worker or queue consumers.
- Scheduled jobs.
- Package exports.
- Test setup.
- Build scripts.
Once you find an entry point, follow one realistic path from input to output. Do not branch into every helper immediately. Mark unknowns and continue until you reach the result.
Use navigation tools, not just text search
Text search is useful, but code navigation gives stronger signals. Use go to definition to find the implementation behind a symbol. Use find references to see who calls it. Use call hierarchy to drill into callers of callers where the language server or IDE supports it.
In an editor, peek definition and peek references can keep context visible while you inspect a path, so you do not lose your place. On hosted repositories, symbol navigation and code search can help you move between functions and classes before you have a local environment.
Treat generated code, vendored code, and framework code differently from project code. They may be necessary, but they are rarely where you should start.
Read tests as executable documentation
Tests show expected behaviour, edge cases, and naming used by maintainers. Start with tests near the feature you are reading. Then look at fixtures, mocks, and setup code.
A good test tells you what matters. A brittle test tells you what the code is coupled to. Both are useful.
When there are no tests, look for examples, issue reports, or command output. If none exist, create a small local reproduction as you learn.
Build a vocabulary list
Unfamiliar code often contains domain terms that are not obvious from general programming knowledge. Write them down.
For each term, capture:
- Where it is created.
- Where it is persisted.
- Whether it is user input, internal state, or derived data.
- Whether it is stable API language or local implementation language.
This is especially useful when two terms look similar. An account, a user, a member, and a principal may not mean the same thing.
Trace data shape changes
Many bugs and misunderstandings happen at boundaries where data changes shape. Follow how data is parsed, validated, enriched, transformed, serialised, and stored.
Pay attention to optional fields, defaults, null handling, date handling, identifiers, and error values. These details often explain why code that looks redundant is actually preserving compatibility.
A diagram can help, but keep it small. A five-node sketch of input, validation, storage, output, and side effects is often enough.
Check error handling
Error handling reveals design intent. Look for which errors are retried, swallowed, wrapped, logged, returned to callers, or shown to users.
Ask:
- Which errors are expected?
- Which errors are treated as impossible?
- Which errors cross process or network boundaries?
- Which errors include enough context for debugging?
- Which errors may expose private data?
Do not assume that a catch block is correct just because it is deliberate. Error handling is often where old assumptions remain after the main path changes.
Understand side effects
Side effects are where a safe-looking read can become risky. Identify writes, network calls, cache updates, queue publishes, emails, file changes, metrics, and logs.
For each side effect, ask when it happens, whether it can happen twice, and whether it is transactional with the main operation. Idempotency matters in retries, background jobs, and event-driven systems.
Read history carefully
Version control history can explain why code is shaped in a surprising way. Use blame and pull request links to find context, but do not treat old rationale as automatically current.
A comment from three years ago may explain a workaround that is still required. It may also describe a dependency version that no longer exists. Verify before preserving or removing it.
Permanent links to specific lines are useful when discussing code, because they point to a fixed commit rather than a moving branch.
Make a small change only after mapping the risk
Before changing unfamiliar code, state what you believe the affected surface is. Include direct callers, tests, configuration, public API, storage format, and operational side effects.
Then make the smallest change that proves the path. Run the narrow test first, then the broader relevant suite. If there is no test, add one where possible.
Leave the map better than you found it
When you learn something non-obvious, preserve it. That may be a renamed variable, a clearer test name, a README note, a comment near a compatibility workaround, or a small architecture note.
Do not document obvious syntax. Document decisions, boundaries, and traps.
Conclusion
Reading unfamiliar code is controlled exploration. Start with purpose, find entry points, follow one path, use navigation tools, read tests, trace data, and map side effects. You do not need to understand everything before making progress. You need to understand enough to change the right thing safely.
