01 / 05
Security / 01

Threat modeling for engineers

A threat model is a list of bad things that could happen to your system, ranked by how much they would hurt, with the mitigation for each one written next to it. That is all. The frameworks (STRIDE, PASTA, LINDDUN, OCTAVE) are different ways to make sure the list is reasonably complete. Most engineers will only ever use STRIDE, and they will use it for two hours at the start of a project and another hour every quarter. The point of this chapter is to make those three hours produce a useful artifact instead of an unreadable Confluence page nobody opens again.


What threat modeling actually is

Threat modeling is the practice of systematically finding what can go wrong with a system before an attacker does it for you. The word "systematically" is the whole game. Any engineer can brainstorm three or four ways their service could be abused; the point of a method is to make the search reasonably complete, so the bad outcome you miss is the obscure one rather than the obvious one staring at you from the architecture diagram. Most breaches in real post-mortems are not clever. They are a missing authorization check, a secret in a URL, a queue worker running as root. A threat model is the cheap, structured hour that catches those before they ship.

It is worth being clear about what it is not. It is not a pentest, which probes a system that already exists. It is not a code review, which reads what was written. Threat modeling happens earlier and at a higher altitude: you reason about the design, not the implementation, which is why an afternoon spent on it before coding is worth more than the same afternoon spent after the code is in production. You are looking for the structural mistakes — the trust you placed in the wrong place — that no amount of careful coding inside a bad design will save you from.

It also does not need to be a heavyweight ceremony. The industry has produced a long list of formal methodologies — STRIDE, PASTA, LINDDUN, OCTAVE, Trike — and the existence of that list scares teams off, because they imagine a multi-week process with a dedicated security architect and a forty-page deliverable. That version exists and almost nobody needs it. The version that matters fits in a working week and, after the first time, in a couple of hours per component. The rest of this chapter is that version.

The four questions

Adam Shostack's reformulation of threat modeling is the most useful version. Forget the frameworks for a moment. Sit down with the team and answer four questions:

1. What are we building? Draw the system. Boxes for components, lines for data flow, dashed lines for trust boundaries. Five components, maybe ten — if your diagram has fifty boxes the model is at the wrong altitude.

2. What can go wrong? Walk through the diagram and, for each component and each data flow, enumerate bad outcomes. STRIDE is the checklist that keeps you honest.

3. What are we going to do about it? For every bad outcome, either accept it (and write down why), mitigate it (and write down how), transfer it (insurance, contract, vendor), or eliminate it (cut the feature).

4. Did we do a good job? Review the model when the design changes. The habit of revisiting is what separates a working threat model from a dead one.

The four-questions framing is short enough to remember and complete enough to drive the meeting. Almost everything else in threat modeling is detail to fill in question 2.

STRIDE — the six classes that catch most things

STRIDE is an acronym for six classes of threat. It was developed at Microsoft in 1999 and has held up better than any of its successors mostly because it is short. The trick is that each letter is the inverse of a property you wanted: spoofing breaks authentication, tampering breaks integrity, repudiation breaks non-repudiation, information disclosure breaks confidentiality, denial of service breaks availability, and elevation of privilege breaks authorization. So walking STRIDE is the same as asking, for each component, which of those six guarantees you are quietly assuming and have not actually built.

S — Spoofing. Pretending to be someone you are not. Stolen credentials, session hijacking, IP spoofing, SSRF against an internal service that trusts source IPs. A concrete one: a support tool that identifies the caller by a customer id in the request body, so anyone who knows a customer id can act as that customer. Mitigations: real authentication everywhere, including service-to-service; mTLS; signed tokens with audience checks. This is the area the authentication primitives page covers in depth.

T — Tampering. Modifying data in transit or at rest without permission. MITM attacks on plaintext channels, modifying request bodies, JSON injection, parameter pollution, modifying objects in an object-store bucket that has public write. Concrete: a price field sent from the client to the checkout endpoint and trusted by the server, so the user buys a laptop for one dollar. Mitigations: TLS, HMAC or signed payloads on sensitive data, integrity hashes, strict bucket policies, immutable audit logs, and never trusting a client-supplied value that affects money or access.

R — Repudiation. A user later claiming they did not do something the system recorded them doing — or the system being unable to prove who did what. Missing audit trails, mutable logs, timestamps the user can set. Concrete: a money transfer with no server-side record of who initiated it, so a fraudulent transfer and a legitimate one look identical after the fact. Mitigations: signed events, append-only logs, server-issued timestamps, idempotency keys tied to the authenticated identity.

I — Information disclosure. Leaking data the user should not see. Verbose error messages, debug headers in production, GraphQL introspection turned on, IDOR ("can I see invoice 1000 by changing the URL?"), timing side channels, accidentally returning PII in responses. Concrete: GET /api/invoices/12345 returns any invoice to any logged-in user because the handler authenticates the request but never checks that the invoice belongs to the caller. Mitigations: redaction at the response layer, output schemas, generic error messages in prod, authorization on every read, never trusting client-supplied object ids. This class overlaps heavily with the common CVE classes that dominate real vulnerability reports.

D — Denial of service. Making the system unavailable. Unbounded queries, ReDoS, zip bombs, slow-loris connections, billion-laughs XML, expensive endpoints with no rate limit. Concrete: a search endpoint that allows arbitrary wildcards, so a single query can scan the whole table and exhaust the connection pool, taking the site down for everyone. Mitigations: rate limiting per identity, request size and complexity limits, query timeouts, complexity analysis for GraphQL, circuit breakers on downstream calls.

E — Elevation of privilege. A low-privilege user gaining higher privilege. Missing authorization checks, role confusion, JWT alg=none, path traversal that reads /etc/shadow, deserialization that constructs admin objects, SUID binaries. Concrete: an admin-only endpoint protected only by hiding the button in the UI, so anyone who finds the route can call it. Mitigations: explicit allow-lists for roles, authorization enforced on the server for every action, defense in depth, never trusting the JWT alg field, sandboxing anything that executes user input.

Run STRIDE per component, not per system. Sit with the diagram, point at one box (the API server, say), and ask: how could someone spoof against this? Tamper with its data? Cause it to repudiate? And so on. Six questions per box, ten boxes, sixty questions. That is your bad-outcome list.

The other useful way to apply STRIDE is along a single request as it travels. Take one real path — a user submitting a payment — and ask, at each hop, which of the six letters is in play. The same six classes show up at different hops with different weights, which is what the next diagram makes concrete. Read it left to right: the threat sits on the arrow, the mitigation sits under it.

ONE REQUEST · STRIDE PER HOPbrowserAPIworkerpaymentS · IE · RT · Dspoof identityauthz bypasstamper reply[token + TLS][row-level authz][signed msg][verify signature]the same six letters reappear at each hop with different weightsS spoof · T tamper · R repudiate · I info-disc · D dos · E elevation
STRIDE mapped onto one request path. The threat rides the arrow; the mitigation sits beneath it. Walking a real request this way catches the gaps a per-box pass can miss.

Trust boundaries — the part that matters most

A trust boundary is a line in the diagram across which data has different levels of trust. The user's browser to your API server is a trust boundary. Your API server to your database is a trust boundary. Your application code to the database row it reads is a trust boundary. Anything that crosses a trust boundary is suspect on the receiving side.

Trust boundaries get the most attention because almost every real vulnerability lives on one. Code inside a single trust zone, talking to code it already trusts, is rarely where the bug is. The bug is at the moment data arrives from somewhere with less trust and gets treated as if it came from somewhere with more — the unvalidated input, the unauthenticated caller assumed to be authenticated, the cached value assumed fresh. If you only had time to do one thing, you would draw the boundaries and check what defenses sit on each crossing, and you would catch most of what a full STRIDE pass would find.

The most useful threat-modeling instinct, and one that takes about a week of practice to develop, is to look at your data-flow diagram and ask "where is the trust boundary I am not seeing". The classic cases:

The form input that becomes a SQL parameter. Obvious trust boundary; most teams handle it with parameterised queries.

The CSV upload that becomes a streaming parser. Less obvious. Trust boundary. A 10MB CSV with a billion-row header is a DoS. A CSV with embedded formulas that Excel auto-executes is a phishing payload for your business users. Both require defenses.

The webhook signature you "validate". Trust boundary on receipt. Verify the signature with constant-time comparison; reject malformed payloads; rate-limit the endpoint; do not just trust that the IP is from the right CIDR (IPs are forgeable in some configurations and providers change them anyway).

The cache that holds another user's data. Trust boundary on read. Cache keys must include the authorization scope, or stale entries leak across users. The most famous example: Cloudflare CDN serving page A's content under user B's URL because the cache key did not include the Vary header.

The async job that processes a queue message. Trust boundary at dequeue time. The message was placed by code that trusted the caller; the worker has no authentication context unless you put one in the message. Defense: include the originating user and their authorization claims in the message, sign the message, validate signature and claims on dequeue.

Data-flow diagrams — the canonical format

The data-flow diagram is the artifact everything else hangs off, so it is worth drawing it the same way every time. A DFD has four shapes, and the discipline of using exactly these four is what makes the diagram a thinking tool rather than decoration:

External entity (rectangle): a user, a third-party service, a customer's browser. Things outside your system that interact with it.

Process (circle or rounded rectangle): something that takes input and produces output. The API server, the worker, the function in your codebase that hashes passwords. Each process is a threat-modeling unit.

Data store (open-ended rectangle): the database, the cache, the queue, the object store. Persistent things.

Data flow (arrow): how data moves between the above. The interesting labels here are the protocol (HTTP, SQL, MQTT) and the content (user-supplied JSON, signed JWT, raw bytes from CSV).

Draw the trust boundaries on top as dashed lines. The boundary between user-browser and your-API-server. The boundary between API-server and database. Anywhere data crosses a boundary, write the validation, authentication, and authorization that happens at the crossing — even if it is "none", because "none" is the most common defect.

Here is the same idea drawn properly. Each box is a process or a store, each arrow is a data flow labelled with its protocol, and each dashed line is a trust boundary. The boundaries are the lines you actually walk during the STRIDE pass; everything that crosses one is a candidate threat.

trust boundarytrust boundarybrowserexternal entityAPI serverprocessworkerprocessorders DBjob queueaudit logpayment APIexternalshipping APIexternalHTTPSSQLenqueuedequeueappendTLSHTTPSthree boundaries, seven crossings — each crossing gets a STRIDE walk
A data-flow diagram with trust boundaries drawn on top. Rectangles are external entities and stores; rounded boxes are processes. The dashed lines are where trust changes.
[Browser] --HTTPS-->  ( API server )  --TLS-->  ( payment provider )
                          |
                          |  --SQL--> [orders DB]
                          |
                          |  --enqueue--> [job queue]
                                              |
                                              v
                                          ( worker )  --HTTPS--> ( shipping API )
                                              |
                                              v
                                          [audit log]

Trust boundaries:
  Browser  | API server               <- everything from here is hostile
  API      | DB / queue               <- the API is the only writer
  Queue    | worker                   <- message originator must be in payload
  Worker   | external provider        <- responses must be validated

Attack trees — when a threat is worth digging into

An attack tree decomposes one bad outcome into the steps an attacker would have to take. Root: "attacker reads another user's invoices". Children: "attacker has a valid session for the target user" OR "attacker can forge an invoice id and bypass authorization" OR "attacker reads the DB directly". Grandchildren: how each of those is achieved.

Written out, that tree looks like the block below. The OR nodes mean any one child is enough; AND nodes (not shown here) would mean an attacker needs all the children together. Tagging each leaf with a rough cost turns the picture into a priority list.

ROOT: read another user's invoices
 ├─ OR  hijack the target's session
 │       ├─ steal session cookie via XSS        [cost: medium]
 │       └─ phish the user's password           [cost: medium]
 ├─ OR  bypass authorization on the invoice API
 │       └─ change the id in /api/invoices/{id}  [cost: trivial]  <-- fix first
 └─ OR  read the database directly
         ├─ SQL injection on a search field      [cost: medium]
         └─ leaked DB credentials in a repo      [cost: low]

Attack trees are most useful for the top-ranked threats from the STRIDE pass — the ones you want to spend serious mitigation effort on. They turn a vague "someone could read another user's data" into a concrete list of paths, each with its own probability and mitigation cost.

You do not need to attack-tree everything. One per quarter on the highest-ranked open threat is plenty. The exercise itself is what surfaces gaps in the mental model.

The reason attack trees earn their keep on the hard threats is that they force you to price the attack. A leaf that requires stealing a hardware token, knowing an internal id, and racing a five-second window is expensive; a leaf that requires changing one number in a URL is free. When you can see the cheapest path from root to leaf, you know exactly where to spend a mitigation dollar — you cut the cheapest path first, because that is the one an attacker will take. A tree where the cheapest path is already costly is a tree you can stop worrying about.

Ranking — DREAD is dead, use likelihood × impact

DREAD (Damage, Reproducibility, Exploitability, Affected users, Discoverability) was the original scoring system. It was retired by Microsoft a decade ago because the scores were subjective and rarely reproducible. Two engineers scoring the same threat produced different DREAD totals, which made the ranking meaningless.

The current best practice is much simpler: rank each threat on a 1-3 scale for likelihood (how likely is it that an attacker would actually find and exploit this) and a 1-3 scale for impact (how bad would it be if they did). Multiply. You get a number from 1 to 9. Anything 6+ goes in the fix-now pile; anything 3-5 is fix-soon; anything below 3 is documented and watched.

The point is not the precision of the numbers — it is forcing a comparison between threats so you spend effort in roughly the right order. A team with 47 documented threats that have not been ranked will spend its time on the one the engineer who wrote them was most worried about, not the one with the highest exposure.

A few rules of thumb keep the scoring honest. For likelihood, anchor on the cost to the attacker: something a script-kiddie can do with a browser and a guess is a three; something that needs insider access or a chain of three other bugs is a one. For impact, anchor on blast radius and reversibility: one user's non-sensitive data is a one, every user's financial records is a three, and anything that lets an attacker pivot deeper into the system is at least a two regardless of the immediate data involved. When two engineers disagree on a score, that disagreement is itself useful — it usually means the threat is underspecified and needs splitting into the two different things people are imagining.

Likelihood times impact is deliberately coarse because precision here is false comfort. You are not predicting a probability; you are sorting a list. Three buckets — fix now, fix soon, watch — is all the resolution the decision needs, and finer scales just invite arguments about whether something is a 6.5 or a 7 that have no bearing on what you actually do next.

Cap the model at a useful size. A threat model with 80 entries is unloved and unread. A threat model with 12 entries, ranked, with current mitigation status, is consulted at every design review. Be aggressive about merging similar threats and dropping the bottom of the list. The model is a working document; treat it like one.

When to threat model — and what counts as "done"

Three good moments to invest the two hours:

At the start of a new component. Before you start coding the new payments service, sit down for an afternoon and produce the model. Catches design-level mistakes (secrets in URLs, plaintext fields in the database) that would be painful to fix later.

When a design changes significantly. A new external integration, a new user type, a new data class (PII, financial, health). Re-walk the diagram; update the trust boundaries; re-STRIDE the new components.

After an incident. Not just the post-mortem — actually update the model so the threat that happened is in there with the mitigation you ended up shipping. Otherwise the next person on the team will hit the same class of bug.

"Done" is when the team agrees the top of the list (anything ranked 6+) is mitigated or explicitly accepted with a rationale. The rest stays in the document for the next review. You will never have a system without threats; you will have a system whose worst threats you have made a decision about.

The word "explicitly" is doing real work in that sentence. Accepting a risk is a legitimate choice — not every threat is worth the cost of mitigation, and some are someone else's problem by contract. But the acceptance has to be written down with a name and a reason, because the alternative is the silent acceptance that comes from never having looked. A documented "we accept the small risk that the audit log can be edited by a database admin, because the database admins are a trusted, audited group of three people" is a decision. The same gap, unwritten, is a finding waiting to surprise you. The artifact's real product is not a list of fixes; it is a record of decisions you can defend later.

Common mistakes the model exists to catch

In rough order of how often they appear in real post-mortems:

Missing authorization checks at the row level. Authentication confirms the user is who they say they are; authorization confirms they may do this particular thing. Many systems have great auth and weak authz — every endpoint requires login, but any logged-in user can GET /api/invoices/12345. The STRIDE category is information disclosure or elevation; the failure pattern has its own name (IDOR — Insecure Direct Object Reference) and accounts for a large share of bug-bounty findings.

Trusting the client for security-critical fields. The classic example is a POST body that includes role: "user" and the server uses it. The variant everyone gets caught by once is a feature flag in the query string. Defense: server is the single source of truth for any field that affects authorization, billing, or routing.

Verbose errors in production. Stack traces in a 500 response page, GraphQL introspection enabled, debug headers turned on. Each gives an attacker map of the internals. Mitigation: generic error responses in prod, internal-only verbose logging.

Secrets in URLs. URLs end up in webserver logs, browser history, referrer headers, third-party analytics, the Referer of the next page the user clicks to. Anything sensitive (session token, password reset token, API key) belongs in the request body or a header, never the URL.

Symmetric trust between microservices. Service A trusts requests from service B because B is on the same VPC. Service B is compromised; A inherits it. Defense: authenticate every request, not every network. mTLS or signed tokens between services.

Async jobs running with admin privileges. Worker dequeues a message and has database root credentials because "it's a worker, what could go wrong". The user who enqueued the message effectively has admin access through the worker. Defense: workers run with the minimum privileges the workload needs; messages carry the originator's authorization context.

Notice that every one of these maps cleanly to a STRIDE letter and a trust boundary. IDOR is information disclosure or elevation at the API boundary. Trusting client fields is tampering at the same boundary. Verbose errors are information disclosure. Secrets in URLs are information disclosure through a channel you forgot was a channel. Symmetric service trust is spoofing across the network boundary. Admin workers are elevation across the queue boundary. The list is not a separate body of knowledge from STRIDE — it is what STRIDE finds when you actually run it, which is the argument for running it.

Tools that help — and ones that get in the way

Microsoft's Threat Modeling Tool is free, runs on Windows, generates STRIDE prompts from a DFD. Useful as a structured prompt for newcomers; the output is verbose and most teams strip it down before using it.

OWASP's Threat Dragon is open source, web-based, exports JSON. Pragmatic middle ground between Microsoft TMT and a whiteboard photo.

pytm models the system as Python code (components, dataflows, boundaries as objects) and generates the diagram and a STRIDE checklist. Good for teams that want the threat model under version control alongside the code; the learning curve is real.

Plain Markdown in the repo. The most common tool used by working teams. Diagram in Mermaid or a checked-in SVG; the threat list as a table; ranking in numbers. Lives next to the code, shows up in PRs, gets updated as a side effect of working on the component. The simplest tool that solves the problem is the one that gets used.

The tools that get in the way are the ones that want a separate workflow — a SaaS platform you log into once a quarter, a heavy framework that demands a 50-field form per threat, anything that decouples the model from the code. They produce beautiful deliverables nobody reads.

A version that fits in a working week

The heavyweight image of threat modeling — a security architect, a war room, weeks of workshops — is what stops most teams from doing it at all. Here is the version that fits a normal sprint, spread across a working week so no single day is eaten:

Monday, one hour: draw it. Get the two or three people who know the system in a room or a call and draw the data-flow diagram on a whiteboard. Five to ten boxes, the arrows between them, the protocols on the arrows. Mark the trust boundaries. Stop when the picture matches the system; do not gold-plate the diagram.

Tuesday, one hour: walk STRIDE. Point at each box and each boundary crossing and run the six questions. Capture every bad outcome as a one-line entry. Do not argue about likelihood yet — capture first, judge later, or the loudest person in the room anchors the whole list.

Wednesday, half an hour: rank. Score likelihood and impact one to three, multiply, sort. Now you have a list with a top and a bottom instead of a wall of equal worries.

Thursday, decide. For everything ranked six or higher, pick one of the four responses — mitigate, accept, transfer, eliminate — and write it down. This is where most of the real engineering decisions get made, and where you may decide a feature is not worth its risk.

Friday, fifteen minutes: write it down. Type the diagram, the table, and the decisions into a Markdown file in the repo. Done. You have spent under three hours of focused time and you have an artifact that will pay for itself the first time it stops one of the six common bugs from shipping.

After the first time, the recurring cost is far lower. A new component reuses the same diagram conventions and the same STRIDE checklist; the model for a feature change is an edit, not a rewrite. The working week is the upfront cost; steady-state is an hour here and there.

Keeping the model alive

The failure mode of threat modeling is not doing it badly. It is doing it once, beautifully, and never again. A threat model written at the start of a project and never touched is wrong within a quarter, because the system it describes no longer exists. The whole value is in question four — "did we do a good job?" — answered repeatedly, not once.

The way to keep it alive is to put it where the work already happens. A Markdown file next to the code shows up in pull requests, so a change that adds a new external integration naturally prompts the reviewer to ask whether the threat model was updated. Tie the review to events that already exist: a design doc, a new data class entering the system, an incident. Do not schedule a quarterly "threat model meeting" that everyone dreads and skips; fold the update into the change that triggered it.

Two small habits do most of the work. First, when a new component is designed, the design doc links its threat model — no model, no merge. Second, when an incident closes, the post-mortem action items include adding the threat that happened to the relevant model with the mitigation that was shipped. Both make the model a byproduct of work you were doing anyway, which is the only way a living document survives. A model that requires a separate ritual to maintain will not be maintained; a model that lives in the diff will.

One more sign of health: the model shrinks as often as it grows. As mitigations ship and features get cut, threats move to "resolved" or disappear. A model that only ever accumulates is a model nobody is pruning, and an unpruned model is on its way to being ignored. Treat it like code — refactor it, delete dead entries, keep it readable.

A two-hour template

A working template that fits in one Markdown file:

# Threat model — Payments service

## 1. What are we building?
[diagram: browser → API → payments worker → Stripe; orders DB; audit log]
Trust boundaries: browser/API, API/internal services, worker/Stripe, worker/audit.

## 2. What can go wrong?
Per component, STRIDE walk.

| # | Threat                                          | Class | L | I | LxI | Status        |
|---|-------------------------------------------------|-------|---|---|-----|---------------|
| 1 | Logged-in user reads another user's invoice    | I     | 3 | 3 | 9   | Mitigated     |
| 2 | Attacker replays old webhook from Stripe       | S/T   | 2 | 3 | 6   | Mitigated     |
| 3 | Worker pulls poisoned message, runs forever    | D     | 2 | 2 | 4   | Mitigated     |
| 4 | Audit log can be tampered after the fact       | R     | 1 | 3 | 3   | Accepted, see |
| 5 | Stripe webhook IP spoofed via partner range    | S     | 1 | 2 | 2   | Mitigated     |

## 3. Mitigations
1. Row-level authorization in invoice repo; integration test for cross-user fetch.
2. Webhook idempotency key + timestamp window check; reject replays > 5 min.
3. Worker has per-message timeout (30s) and a max-attempts kill.
4. Audit log is append-only; tamper detection would require log shipping to a
   separate account (deferred, low risk).
5. Stripe signature validation is the source of truth, not IP.

## 4. Review
Last updated 2026-03-12. Re-review when the worker gains shipping integration.

That is the entire artifact. Two hours of conversation, fifteen minutes to type, lives next to the code, gets updated when the design changes. Almost every team would benefit from one of these per significant component; almost no team has them, which is why incident reports keep finding the same six classes of bug.

Further reading

Adam Shostack's Threat Modeling: Designing for Security is the canonical text; the four-questions framing is from chapter 1. Shostack's Threats: What Every Engineer Should Learn from Star Wars uses the films as worked examples; less heavy, more memorable. The OWASP Threat Modeling Cheat Sheet is a useful one-pager. For a pragmatic case study, Cloudflare's engineering blog has a series on threat-modeling their own services that demonstrates the model staying alive over time.

Inside this section, the common CVE classes page catalogues the bugs your STRIDE pass is trying to find before they become CVEs; the secrets management and authentication pages cover two of the mitigation patterns that show up most often in the "what do we do about it" column. The TLS material in the networking section covers the cryptographic underpinnings most threat models depend on but rarely re-prove.

Found this useful?