The CVE classes you actually meet
The CVE database has tens of thousands of entries. The half-dozen patterns on this page account for almost all the ones a working web engineer will personally cause — and each one is the same trust-boundary slip, drawn once in the diagram.
The one pattern that connects them all
The diagram above is the whole chapter. Each class on this page is a trust-boundary slip — user-controlled data getting interpreted as code, instructions, paths, or commands somewhere it should not have been. Once you can see the shape, the defenses repeat: parameterise, allow-list, normalise, time-cap, never trust the client. The point of this chapter is to give you the pattern recognition fast enough that you spot the vulnerable shape in your own diff before it ships.
Every vulnerability class on this page is an instance of one mistake — code somewhere interpreted untrusted input as something other than data. SQL injection interprets it as SQL. Command injection interprets it as shell. Cross-site scripting interprets it as JavaScript in the browser. Deserialization interprets it as object constructors. SSRF interprets it as a URL. Path traversal interprets it as a filesystem path. ReDoS interprets it as a regex input. Prototype pollution interprets it as a JS object key.
The defense in every case is the same shape: separate data from instructions. Use APIs that take data as a parameter rather than letting it become part of the instruction stream. Where that is impossible, validate against a strict allow-list. Where even that is impossible, bound the time and memory the operation can consume. The five rules at the end of this chapter formalise the pattern; the sections in between walk through what each class looks like and what the safe shape is.
It helps to hang these on a map you already know. The OWASP Top 10 is the industry's shorthand for the risks that cause real incidents, and most of this chapter maps onto it directly: broken access control sits at number one, injection (which folds in SQL, command, and cross-site scripting) sits near the top, and insecure design, security misconfiguration, vulnerable components, identification failures, software-and-data integrity failures, and SSRF round out the list. The point of the Top 10 is not to memorise ten names — it is to recognise that a small set of shapes produces almost every breach, and to wire your defenses to those shapes. If you start with how an engineer designs against these classes during a feature, the threat modeling chapter walks that process; here we stay close to the code.
SQL injection — the patriarch
Untrusted input gets concatenated into a SQL string. The string is parsed by the database; what you intended as data becomes part of the query.
# Vulnerable
cur.execute(f"SELECT * FROM users WHERE name = '{user_input}'")
# user_input = "x' OR '1'='1"
# query becomes: SELECT * FROM users WHERE name = 'x' OR '1'='1' -> returns all rows
# Safe — parameterised
cur.execute("SELECT * FROM users WHERE name = %s", (user_input,))
# The database driver sends name and value separately; user_input is data,
# never parsed as SQL.Parameterised queries (also called prepared statements, bound parameters) are the only defense that scales. They work because the database wire protocol distinguishes the SQL template from the parameter values; the parser sees the template once, then the values are inserted as data. Every modern driver supports them. Use them on every query, not just the obviously dangerous ones — the safe-looking ones are how SQL injection bugs survive code review.
Three places parameterisation does not cover:
Identifiers (table names, column names). Most drivers cannot parameterise these. If your code lets a user pick which column to ORDER BY, do not concatenate — use an allow-list of valid columns and reject anything else.
LIMIT / OFFSET in some drivers. Older MySQL clients required these as integers in the query string. Cast to integer in your code and validate the range before interpolating.
Dynamic query building. Reporting tools that build "WHERE" clauses from user-chosen filters need a query builder (sqlalchemy, knex, sea-orm) that parameterises the values while letting your code shape the structure. Hand-rolled string concatenation here is where the worst SQL injection bugs live.
Command injection — same shape, different parser
Untrusted input gets concatenated into a shell command. The shell interprets metacharacters and your innocent grep becomes an arbitrary-command-execution primitive.
# Vulnerable
os.system(f"convert {user_filename} output.png")
# user_filename = "x.png; rm -rf /"
# shell runs: convert x.png ; rm -rf / output.png
# Safe — pass args as a list, no shell involved
subprocess.run(["convert", user_filename, "output.png"], check=True)
# The OS exec syscall takes argv as separate strings; no shell parsing happens.The rule: prefer APIs that take arguments as a list. Where the underlying tool insists on
a shell (some legacy binaries, some pipeline syntax), wrap with shlex.quote
(Python) or equivalent in your language; never rely on filtering "dangerous" characters
because the list of dangerous characters in a shell is longer than people remember.
Variants of the same bug live in: spawn() calls in Node.js with shell:true, backticks and exec() in Ruby/Perl, eval() of any kind anywhere. If your code accepts user input and passes it to anything that might invoke a shell or evaluate code, audit the call chain.
Deserialization — instructions disguised as data
A serialisation format that supports object construction (Java's ObjectInputStream, Python pickle, Ruby Marshal, some YAML loaders, Node's serialize-javascript) can construct arbitrary classes when it deserialises. An attacker who controls the byte stream controls what gets constructed; carefully chosen objects trigger code execution as side effects of their constructors or destructors.
# Vulnerable Python
data = pickle.loads(request.body)
# attacker sends a pickle stream that constructs an os.system call
# Safe alternative — JSON, which has no class-construction capability
data = json.loads(request.body)
# If you must use pickle (you almost never must), restrict the class set:
class RestrictedUnpickler(pickle.Unpickler):
def find_class(self, module, name):
if (module, name) in ALLOWED:
return super().find_class(module, name)
raise pickle.UnpicklingError(f"refused {module}.{name}")The defense is to never deserialise untrusted input with a format that supports arbitrary construction. JSON has no class-construction capability and is the right default for inbound data. YAML's safe-load variants disable construction. Protobuf and MessagePack are structurally incapable of arbitrary deserialization. Pickle, Java ObjectInputStream, Ruby Marshal, .NET BinaryFormatter, and PHP unserialize are all dangerous and should not see untrusted bytes.
Where you do need rich serialization (caching internal objects, IPC between your own services), pair it with a MAC or HMAC so the receiver verifies the bytes came from a trusted producer before deserialising. The pattern: append HMAC(secret, payload); on receipt, verify HMAC before passing to the deserialiser. Equivalent to "signed cookies" at a higher level.
SSRF — the most underestimated class
Server-Side Request Forgery: your server makes an HTTP request to a URL the attacker controls. The attacker uses your server as a proxy to reach internal services that the attacker could not reach directly — the cloud metadata endpoint, internal admin consoles, databases on private subnets.
# Vulnerable — webhook URL or "fetch image from URL"
def webhook(url):
response = requests.get(url)
return response.text
# attacker sends: http://169.254.169.254/latest/meta-data/iam/security-credentials/
# AWS returns the EC2 instance role's temporary credentials.
# (This is the Capital One 2019 breach in one paragraph.)The defense is layered because no single check catches everything:
Allow-list of hostnames. If the feature is "fetch an image from a URL the user pastes", restrict to a specific set of known image hosts. Almost every "user-pasted URL" feature can work this way; teams reach for "allow any URL" out of laziness.
Block private IP ranges before connecting. Resolve the hostname to an IP, check the IP against the RFC 1918 private ranges (10/8, 172.16/12, 192.168/16), link-local (169.254/16), loopback (127/8), IPv6 equivalents. Reject if any match. Re-check after redirects — many SSRF bypasses use a redirect from a public IP to a private one.
Use IMDSv2 on AWS. The metadata service had a one-shot SSRF protection (IMDSv2 requires a token obtained via PUT) that defeats the simplest SSRF-against-IMDS attack. Set the instance metadata to v2-only; the Capital One breach used IMDSv1.
Use an outbound proxy with allow-listing. At scale, route all server-side outbound HTTP through a forward proxy that enforces the allow-list at the network layer. Application-layer checks can be bypassed by DNS rebinding (the hostname resolves to a public IP at check time, a private IP at connect time); network-layer checks at the proxy cannot.
Prototype pollution — JavaScript's special case
In JavaScript, an object's prototype chain is mutable at runtime. If user-controlled input
can set a key called __proto__, the resulting object's prototype is replaced
— and because most objects share the Object prototype, the change propagates to every
object in the program.
// Vulnerable: a "merge" function that copies all keys without checking __proto__
function merge(target, source) {
for (const key in source) {
if (typeof source[key] === "object") {
merge(target[key], source[key]);
} else {
target[key] = source[key];
}
}
}
merge({}, JSON.parse(req.body));
// req.body = '{"__proto__": {"isAdmin": true}}'
// Now every object in the program inherits isAdmin = true.Real CVEs from this class: lodash before 4.17.12 had this in its merge function (CVE- 2019-10744). jQuery, async, minimist, and dozens of other npm packages have had variations.
Defenses:
Use Object.create(null) for maps. Objects created this way have no
prototype; setting __proto__ on them does not change a chain.
Reject dangerous keys at the merge boundary. Disallow keys named
__proto__, constructor, prototype when copying from
untrusted input. Most modern merge libraries do this; check the version.
Freeze Object.prototype. Object.freeze(Object.prototype) at
app startup prevents any modification. Some libraries break under this; test before
deploying.
Path traversal — the .. that gets you
User input gets used as a filesystem path. The path includes .. to escape
the directory the developer intended.
# Vulnerable
return open(f"/var/www/uploads/{user_filename}").read()
# user_filename = "../../etc/passwd"
# opens /etc/passwd
# Safe
import os
base = "/var/www/uploads"
target = os.path.realpath(os.path.join(base, user_filename))
if not target.startswith(base + os.sep):
raise PermissionError("path escapes base")
return open(target).read()The defenses that work:
Resolve to canonical form, then check the prefix. Use the OS's
canonicalisation (realpath, Path.resolve) so symlinks and .. are
normalised, then verify the result is inside the allowed directory. Naive string-checks
for ".." miss URL-encoded variants (%2e%2e), Windows backslashes, and
normalisation tricks (....// with double dots).
Use indirection. Store uploaded files with random ids; never let the user supply the filename for reads. Map id → path server-side; users see only the id.
chroot or container. If the workload is "open whatever file the user names", run it in a chroot or a container with only the expected directory mounted. The defense at the OS layer survives bugs at the application layer.
ReDoS — the regex that hangs your service
A regex with catastrophic backtracking can run for exponential time on a carefully chosen
input. The pattern (a+)+$ matched against "aaaaaaaaaaaaaaaaaaa!" takes
hundreds of millions of steps. The CPU pegs at 100%; concurrent requests pile up; the
service goes down.
Real CVEs from this class are common — Cloudflare's 2019 global outage was a regex with a catastrophic backtracking case in a WAF rule. The Stack Exchange outage in 2016 was similar.
Three defenses, in order of strength:
Use a non-backtracking regex engine. Google's RE2 (used in Go, available as a library in most languages) has linear-time guarantees because it compiles regexes to DFAs and does not support backreferences. If your patterns can be expressed without backreferences, RE2 is the safe choice.
Cap regex execution time at the runtime. Node 16+ supports a regex
timeout via RegExp options; Java does not at the language level but many
frameworks bolt one on; Python's regex module supports timeouts. Where the
engine supports it, set a low cap (10-50ms) on user-supplied or user-influenced patterns.
Audit static patterns for backtracking shapes. Patterns with nested
quantifiers ((a+)*, (a|a)+, (.*)+) and patterns
with overlapping alternatives are the canonical traps. Tools like Owl and recheck can
analyse a regex statically for catastrophic-backtracking risk.
Auth bypass — IDOR and missing checks
Insecure Direct Object Reference. The endpoint GET /api/invoices/12345 is
gated by "user must be logged in" but not by "user must own invoice 12345". Any logged-in
user can change the id and read anyone's invoice.
IDOR is the single most common high-severity finding on bug-bounty platforms. The defense is mechanical: every read of a resource by id must verify the resource belongs to (or is shared with) the authenticated user.
# Vulnerable
@app.route("/api/invoices/<int:invoice_id>")
@login_required
def get_invoice(invoice_id):
return Invoice.query.get(invoice_id)
# Safe — scope by current_user
@app.route("/api/invoices/<int:invoice_id>")
@login_required
def get_invoice(invoice_id):
inv = Invoice.query.filter_by(
id=invoice_id,
user_id=current_user.id # the gate
).first_or_404()
return invThe pattern fails most often at the framework level. Some ORMs let you scope queries globally ("every query for Invoice must include a user_id filter") — turn that on if your framework supports it. Some authorization libraries (Cerbos, OPA, Cedar) enforce policy at a layer above the ORM and refuse to return rows that violate policy. Either approach is stronger than relying on every developer remembering the filter.
Related: missing authorization on state-changing endpoints. POST /api/admin/...
must check the user has admin scope, not just that they are logged in. The "is admin" gate
gets forgotten more often than "user must be logged in" because it is the second check,
not the first.
XSS — the original, still relevant
Cross-site scripting is injection where the interpreter is the victim's browser. User input gets written into a page without escaping, and the browser runs it as JavaScript in the victim's session. The injected script can read the page, read non-HttpOnly cookies, make requests as the user, and deface or rewrite anything on screen. XSS rolls up into the injection entry on the OWASP Top 10 for a reason — it is the same data-as-code mistake, just pointed at the DOM instead of a database.
It comes in three flavours, and the distinction changes where you defend:
Stored XSS. The malicious input is saved server-side — a comment, a profile bio, a support-ticket body — and served back to every viewer. One injection hits everyone who loads the page, which makes it the most damaging variant. A single stored payload in a high-traffic comment field can harvest sessions at scale.
Reflected XSS. The input is bounced straight back in the response, usually from a query parameter echoed into the page (a search box that prints "no results for your-term"). It needs a victim to click a crafted link, so it is delivered by phishing rather than stored, but the effect once clicked is identical.
DOM-based XSS. The injection never touches the server. Client-side
JavaScript reads from a source the attacker controls (location.hash,
document.referrer) and writes it into a sink such as
element.innerHTML or document.write. Server-side escaping cannot
help here because the dangerous flow lives entirely in the browser; you defend at the sink.
The modern defense layer applies to all three:
Output encoding, in the right context. Escaping is contextual: HTML body,
HTML attribute, JavaScript string, URL, and CSS each need different encoding. Every modern
templating engine HTML-escapes by default (React JSX, Svelte, Jinja, Handlebars, Vue), so if
you write {user.name} the engine encodes it for you. The dangerous APIs are
the ones that opt out — dangerouslySetInnerHTML in React, @html in
Svelte, {{{ raw }}} in Handlebars, and any direct
innerHTML = assignment. Audit every use of these, and when you must render
user-supplied HTML, run it through a sanitiser such as DOMPurify rather than trusting your
own filter.
Content Security Policy. A CSP header tells the browser "only run scripts from these origins; never run inline scripts unless they carry this nonce." An XSS bug becomes far less useful if the injected script cannot execute. A strict CSP built on nonces or hashes (rather than long origin allow-lists) is the second line of defense and catches the XSS bugs that slip past encoding. It does not replace encoding; the two layers cover each other's gaps.
HttpOnly + Secure + SameSite cookies. If your session cookie cannot be read by JavaScript, an XSS bug cannot exfiltrate it directly. It does not solve everything — the injected script can still act as the user from inside the page — but it removes the simplest session-theft path and is free to set.
CSRF — riding the user's own session
Cross-site request forgery is the inverse of XSS. The attacker does not inject script into your site; they get the victim's own browser, while logged into your site, to send a request the victim never intended. Because the browser attaches the session cookie automatically to any request bound for your domain, a hidden form or image tag on the attacker's page can fire a state-changing action — transfer money, change an email, delete an account — using the victim's credentials.
The defenses, layered:
SameSite cookies. Setting your session cookie to
SameSite=Lax (or Strict) tells the browser not to send it on
cross-site requests, which kills the classic CSRF vector for top-level navigations and
form posts. Lax is a sensible default for session cookies; Strict is stronger but breaks
some legitimate cross-site link flows. This single attribute removes most CSRF risk for free.
Anti-CSRF tokens. The server embeds a random, per-session (or per-request) token in each form and rejects any state-changing request that does not echo it back. The attacker's page cannot read the token — the same-origin policy stops it — so it cannot forge a valid request. The synchroniser-token pattern is the standard server-rendered defense; the double-submit-cookie pattern is the common variant for APIs that hold no server-side session state.
Verify the request actually came from your site. For APIs, requiring a
custom header (which cross-site forms cannot set without a CORS preflight) or checking the
Origin header on state-changing requests adds a cheap second gate. Note that
GET requests must stay side-effect-free for any of this to hold — a CSRF token on a form
means nothing if a logged-in GET /delete?id=5 also works.
Secrets exposure — the leak every breach depends on
Almost every serious breach eventually turns on a credential that should not have been
reachable. An API key committed to a public repo, a database password baked into a Docker
image layer, an access token printed into a log, a .env file served because a
static handler was pointed at the wrong directory. None of these is exotic; they are the
quiet, recurring shape behind the headline incidents. Secrets exposure shows up on the OWASP
Top 10 under security misconfiguration and cryptographic failures, and it is the most
boring class to fix and the most expensive to get wrong.
# How secrets leak, in order of how often it happens
# 1. Committed to git — survives in history even after you delete the line
git log -p | grep -i "api_key\|password\|secret"
# 2. Baked into a container image layer (visible to anyone who pulls it)
# Use build secrets / runtime injection, never COPY .env into the image
# 3. Logged — tokens echoed in request dumps or error traces
logger.info(f"calling api with {headers}") # don't; redact first
# 4. Returned in an API response or error message (stack traces, debug=True)The fixes are mechanical and cheap once you commit to them:
Keep secrets out of source entirely. Inject them at runtime from a secrets manager (Vault, AWS Secrets Manager, GCP Secret Manager, the platform's environment injection). The code should read a name, not a value. The dedicated secrets management chapter covers the full pattern.
Scan before you push. A pre-commit hook (gitleaks, trufflehog) and a CI scanner catch the accidental commit before it reaches a remote. Assume any secret that lands in git history is burned and must be rotated, not just removed.
Rotate, and make rotation cheap. Short-lived, automatically rotated credentials limit the blast radius of any leak. A secret that lives for an hour is far less valuable to an attacker than one that lives for three years. The authentication chapter covers how token lifetimes and rotation interact with sessions.
Vulnerable dependencies and supply chain
Most of the code in your application is not yours. A typical service pulls in hundreds of direct and transitive dependencies, and a known vulnerability in any of them is your vulnerability the moment it ships. This is its own OWASP Top 10 entry — vulnerable and outdated components — and it sits alongside software-and-data integrity failures, which covers the supply-chain attacks where the dependency itself is hostile.
The two shapes to defend against:
Known-vulnerable versions. A CVE is published against a library you
depend on; until you upgrade, you are exposed. The Log4Shell incident (CVE-2021-44228) was
this shape at planetary scale — a logging library nearly everyone used had a deserialization
path that turned a logged string into remote code execution. The fix is to know your bill of
materials and watch it. Run a dependency scanner (Dependabot, Snyk, npm audit,
pip-audit, OWASP Dependency-Check) in CI, generate an SBOM so you can answer
"are we affected" in minutes rather than days, and keep dependencies current enough that the
upgrade path is a patch bump and not a rewrite.
Hostile packages. An attacker publishes a malicious package, or compromises a real one, and your build pulls it in. Typosquatting (a package one keystroke off a popular name), dependency confusion (a public package shadowing your internal one), and compromised maintainer accounts are the usual routes. Defenses: pin exact versions and commit a lockfile so builds are reproducible, verify integrity hashes, prefer packages with many maintainers and recent activity, and treat a postinstall script in a new dependency as something to read before you run. Where it matters, run builds in a sandbox with no network and no secrets so a malicious install script has nothing to reach.
Broken access control — the number-one risk
Broken access control is the single largest category on the OWASP Top 10, and IDOR plus missing authorization checks are its two most common faces. The IDOR section above is the read shape; the write shape is just as common and often worse. Both come from the same gap: the code authenticates ("you are logged in") but forgets to authorize ("you may touch this specific thing"). It is the number-one risk precisely because the missing check is invisible in a passing test — the happy path works for the owner, and nobody writes the test where a stranger tries the same id.
Beyond IDOR and the missing admin check, the same class includes: trusting a hidden form field or a client-supplied role, allowing an action because the UI hid the button (the API still accepts it), and failing to re-check authorization after the first request in a multi-step flow. The durable fix is structural — deny by default, enforce authorization in one place rather than scattering checks through every handler, and scope every data access to the authenticated principal at the query layer so a forgotten check fails closed instead of open. The authentication chapter goes deeper on how sessions and identity feed these checks, and the API auth protocols chapter covers how tokens and scopes carry authorization across service boundaries.
The five rules that catch most of this
Generalising across the classes above:
1. Separate data from instructions. Parameterise queries, pass argv as a list, use safe deserialisers, escape templates. The most common defense.
2. Allow-list, do not deny-list. Specify what is allowed (image hostnames, column names, file extensions, roles). Reject everything else. Deny-lists always miss a case.
3. Normalise before checking. Resolve paths to canonical form, decode URL encoding, lowercase where case does not matter. Run security checks on the normalised form so attackers cannot bypass via encoding tricks.
4. Bound time and memory on any operation that processes untrusted input. Regex timeouts, query timeouts, JSON depth limits, request body size limits, decompression size limits. Defeats the entire denial-of-service family.
5. Authorize every read, not just the first. Authentication ("you are logged in") is not authorization ("you may see this row"). IDOR survives because engineers check the former and forget the latter.
A working app that gets these five right catches the vast majority of vulnerabilities before they become CVEs. Pattern-matching on the shape — "is this input becoming part of an instruction stream somewhere" — is the skill the chapter is trying to install.
Further reading
OWASP's Top Ten is the canonical list and each entry links to deeper material. The OWASP Cheat Sheet Series is the practical reference — one short page per defense pattern. PortSwigger's Web Security Academy has interactive labs for almost every class on this page and is the best way to develop the pattern-recognition that pure reading cannot teach. HackerOne's Hacktivity feed is a continuous stream of real bug-bounty reports — read it for a few weeks and you will recognise the shapes faster than any reference document teaches.
Inside this codex, the threat-modeling chapter covers how to surface these threats during design; the authentication chapter covers the auth-bypass shapes in more detail; the secrets-management chapter covers the credential-leak shape that almost every breach eventually depends on.