04 / 05

Security / 04

The CVE classes you actually meet

The CVE database has tens of thousands of entries. The half-dozen patterns on this page account for almost all the ones a working web engineer will personally cause — and each one is the same trust-boundary slip, drawn once in the diagram.

The one pattern that connects them all

The diagram above is the whole chapter. Each class on this page is a trust-boundary slip — user-controlled data getting interpreted as code, instructions, paths, or commands somewhere it should not have been. Once you can see the shape, the defenses repeat: parameterise, allow-list, normalise, time-cap, never trust the client. The point of this chapter is to give you the pattern recognition fast enough that you spot the vulnerable shape in your own diff before it ships.

Every vulnerability class on this page is an instance of one mistake — code somewhere interpreted untrusted input as something other than data. SQL injection interprets it as SQL. Command injection interprets it as shell. Cross-site scripting interprets it as JavaScript in the browser. Deserialization interprets it as object constructors. SSRF interprets it as a URL. Path traversal interprets it as a filesystem path. ReDoS interprets it as a regex input. Prototype pollution interprets it as a JS object key.

The same input, two paths: concatenated into the instruction stream (unsafe), or bound as a parameter (safe). Eight classes, one shape.

The defense in every case is the same shape: separate data from instructions. Use APIs that take data as a parameter rather than letting it become part of the instruction stream. Where that is impossible, validate against a strict allow-list. Where even that is impossible, bound the time and memory the operation can consume. The five rules at the end of this chapter formalise the pattern; the sections in between walk through what each class looks like and what the safe shape is.

It helps to hang these on a map you already know. The OWASP Top 10 is the industry's shorthand for the risks that cause real incidents, and most of this chapter maps onto it directly: broken access control sits at number one, injection (which folds in SQL, command, and cross-site scripting) sits near the top, and insecure design, security misconfiguration, vulnerable components, identification failures, software-and-data integrity failures, and SSRF round out the list. The point of the Top 10 is not to memorise ten names — it is to recognise that a small set of shapes produces almost every breach, and to wire your defenses to those shapes. If you start with how an engineer designs against these classes during a feature, the threat modeling chapter walks that process; here we stay close to the code.

SQL injection — the patriarch

Untrusted input gets concatenated into a SQL string. The string is parsed by the database; what you intended as data becomes part of the query.

# Vulnerable
cur.execute(f"SELECT * FROM users WHERE name = '{user_input}'")
# user_input = "x' OR '1'='1"
# query becomes: SELECT * FROM users WHERE name = 'x' OR '1'='1'  -> returns all rows

# Safe — parameterised
cur.execute("SELECT * FROM users WHERE name = %s", (user_input,))
# The database driver sends name and value separately; user_input is data,
# never parsed as SQL.

Parameterised queries (also called prepared statements, bound parameters) are the only defense that scales. They work because the database wire protocol distinguishes the SQL template from the parameter values; the parser sees the template once, then the values are inserted as data. Every modern driver supports them. Use them on every query, not just the obviously dangerous ones — the safe-looking ones are how SQL injection bugs survive code review.

The same hostile input on both paths. Concatenation lets it rewrite the query; a parameterised template fixes the plan first, so the value stays data.

Three places parameterisation does not cover:

Identifiers (table names, column names). Most drivers cannot parameterise these. If your code lets a user pick which column to ORDER BY, do not concatenate — use an allow-list of valid columns and reject anything else.

LIMIT / OFFSET in some drivers. Older MySQL clients required these as integers in the query string. Cast to integer in your code and validate the range before interpolating.

Dynamic query building. Reporting tools that build "WHERE" clauses from user-chosen filters need a query builder (sqlalchemy, knex, sea-orm) that parameterises the values while letting your code shape the structure. Hand-rolled string concatenation here is where the worst SQL injection bugs live.

Command injection — same shape, different parser

Untrusted input gets concatenated into a shell command. The shell interprets metacharacters and your innocent grep becomes an arbitrary-command-execution primitive.

# Vulnerable
os.system(f"convert {user_filename} output.png")
# user_filename = "x.png; rm -rf /"
# shell runs: convert x.png ; rm -rf / output.png

# Safe — pass args as a list, no shell involved
subprocess.run(["convert", user_filename, "output.png"], check=True)
# The OS exec syscall takes argv as separate strings; no shell parsing happens.

The rule: prefer APIs that take arguments as a list. Where the underlying tool insists on a shell (some legacy binaries, some pipeline syntax), wrap with shlex.quote (Python) or equivalent in your language; never rely on filtering "dangerous" characters because the list of dangerous characters in a shell is longer than people remember.

Variants of the same bug live in: spawn() calls in Node.js with shell:true, backticks and exec() in Ruby/Perl, eval() of any kind anywhere. If your code accepts user input and passes it to anything that might invoke a shell or evaluate code, audit the call chain.

Deserialization — instructions disguised as data

A serialisation format that supports object construction (Java's ObjectInputStream, Python pickle, Ruby Marshal, some YAML loaders, Node's serialize-javascript) can construct arbitrary classes when it deserialises. An attacker who controls the byte stream controls what gets constructed; carefully chosen objects trigger code execution as side effects of their constructors or destructors.

# Vulnerable Python
data = pickle.loads(request.body)
# attacker sends a pickle stream that constructs an os.system call

# Safe alternative — JSON, which has no class-construction capability
data = json.loads(request.body)

# If you must use pickle (you almost never must), restrict the class set:
class RestrictedUnpickler(pickle.Unpickler):
    def find_class(self, module, name):
        if (module, name) in ALLOWED:
            return super().find_class(module, name)
        raise pickle.UnpicklingError(f"refused {module}.{name}")

The defense is to never deserialise untrusted input with a format that supports arbitrary construction. JSON has no class-construction capability and is the right default for inbound data. YAML's safe-load variants disable construction. Protobuf and MessagePack are structurally incapable of arbitrary deserialization. Pickle, Java ObjectInputStream, Ruby Marshal, .NET BinaryFormatter, and PHP unserialize are all dangerous and should not see untrusted bytes.

Where you do need rich serialization (caching internal objects, IPC between your own services), pair it with a MAC or HMAC so the receiver verifies the bytes came from a trusted producer before deserialising. The pattern: append HMAC(secret, payload); on receipt, verify HMAC before passing to the deserialiser. Equivalent to "signed cookies" at a higher level.

SSRF — the most underestimated class

Server-Side Request Forgery: your server makes an HTTP request to a URL the attacker controls. The attacker uses your server as a proxy to reach internal services that the attacker could not reach directly — the cloud metadata endpoint, internal admin consoles, databases on private subnets.

# Vulnerable — webhook URL or "fetch image from URL"
def webhook(url):
    response = requests.get(url)
    return response.text

# attacker sends: http://169.254.169.254/latest/meta-data/iam/security-credentials/
# AWS returns the EC2 instance role's temporary credentials.
# (This is the Capital One 2019 breach in one paragraph.)

The defense is layered because no single check catches everything:

Allow-list of hostnames. If the feature is "fetch an image from a URL the user pastes", restrict to a specific set of known image hosts. Almost every "user-pasted URL" feature can work this way; teams reach for "allow any URL" out of laziness.

Block private IP ranges before connecting. Resolve the hostname to an IP, check the IP against the RFC 1918 private ranges (10/8, 172.16/12, 192.168/16), link-local (169.254/16), loopback (127/8), IPv6 equivalents. Reject if any match. Re-check after redirects — many SSRF bypasses use a redirect from a public IP to a private one.

Use IMDSv2 on AWS. The metadata service had a one-shot SSRF protection (IMDSv2 requires a token obtained via PUT) that defeats the simplest SSRF-against-IMDS attack. Set the instance metadata to v2-only; the Capital One breach used IMDSv1.

Use an outbound proxy with allow-listing. At scale, route all server-side outbound HTTP through a forward proxy that enforces the allow-list at the network layer. Application-layer checks can be bypassed by DNS rebinding (the hostname resolves to a public IP at check time, a private IP at connect time); network-layer checks at the proxy cannot.

The webhook problem is SSRF in disguise. "Send a webhook to a URL of the user's choice" is exactly the SSRF primitive. Every webhook system needs the same defenses — allow-list of hosts, private-IP block, redirect re-check, retry-after-token throttle. Some teams refuse to ship arbitrary webhooks for this reason and require webhooks to go to a known list of public endpoints.

Prototype pollution — JavaScript's special case

In JavaScript, an object's prototype chain is mutable at runtime. If user-controlled input can set a key called __proto__, the resulting object's prototype is replaced — and because most objects share the Object prototype, the change propagates to every object in the program.

// Vulnerable: a "merge" function that copies all keys without checking __proto__
function merge(target, source) {
  for (const key in source) {
    if (typeof source[key] === "object") {
      merge(target[key], source[key]);
    } else {
      target[key] = source[key];
    }
  }
}

merge({}, JSON.parse(req.body));
// req.body = '{"__proto__": {"isAdmin": true}}'
// Now every object in the program inherits isAdmin = true.

Real CVEs from this class: lodash before 4.17.12 had this in its merge function (CVE- 2019-10744). jQuery, async, minimist, and dozens of other npm packages have had variations.

Defenses:

Use Object.create(null) for maps. Objects created this way have no prototype; setting __proto__ on them does not change a chain.

Reject dangerous keys at the merge boundary. Disallow keys named __proto__, constructor, prototype when copying from untrusted input. Most modern merge libraries do this; check the version.

Freeze Object.prototype. Object.freeze(Object.prototype) at app startup prevents any modification. Some libraries break under this; test before deploying.

Path traversal — the .. that gets you

User input gets used as a filesystem path. The path includes .. to escape the directory the developer intended.

# Vulnerable
return open(f"/var/www/uploads/{user_filename}").read()
# user_filename = "../../etc/passwd"
# opens /etc/passwd

# Safe
import os
base = "/var/www/uploads"
target = os.path.realpath(os.path.join(base, user_filename))
if not target.startswith(base + os.sep):
    raise PermissionError("path escapes base")
return open(target).read()

The defenses that work:

Resolve to canonical form, then check the prefix. Use the OS's canonicalisation (realpath, Path.resolve) so symlinks and .. are normalised, then verify the result is inside the allowed directory. Naive string-checks for ".." miss URL-encoded variants (%2e%2e), Windows backslashes, and normalisation tricks (....// with double dots).

Use indirection. Store uploaded files with random ids; never let the user supply the filename for reads. Map id → path server-side; users see only the id.

chroot or container. If the workload is "open whatever file the user names", run it in a chroot or a container with only the expected directory mounted. The defense at the OS layer survives bugs at the application layer.

ReDoS — the regex that hangs your service

A regex with catastrophic backtracking can run for exponential time on a carefully chosen input. The pattern (a+)+$ matched against "aaaaaaaaaaaaaaaaaaa!" takes hundreds of millions of steps. The CPU pegs at 100%; concurrent requests pile up; the service goes down.

Real CVEs from this class are common — Cloudflare's 2019 global outage was a regex with a catastrophic backtracking case in a WAF rule. The Stack Exchange outage in 2016 was similar.

Three defenses, in order of strength:

Use a non-backtracking regex engine. Google's RE2 (used in Go, available as a library in most languages) has linear-time guarantees because it compiles regexes to DFAs and does not support backreferences. If your patterns can be expressed without backreferences, RE2 is the safe choice.

Cap regex execution time at the runtime. Node 16+ supports a regex timeout via RegExp options; Java does not at the language level but many frameworks bolt one on; Python's regex module supports timeouts. Where the engine supports it, set a low cap (10-50ms) on user-supplied or user-influenced patterns.

Audit static patterns for backtracking shapes. Patterns with nested quantifiers ((a+)*, (a|a)+, (.*)+) and patterns with overlapping alternatives are the canonical traps. Tools like Owl and recheck can analyse a regex statically for catastrophic-backtracking risk.

Auth bypass — IDOR and missing checks

Insecure Direct Object Reference. The endpoint GET /api/invoices/12345 is gated by "user must be logged in" but not by "user must own invoice 12345". Any logged-in user can change the id and read anyone's invoice.

IDOR is the single most common high-severity finding on bug-bounty platforms. The defense is mechanical: every read of a resource by id must verify the resource belongs to (or is shared with) the authenticated user.

# Vulnerable
@app.route("/api/invoices/<int:invoice_id>")
@login_required
def get_invoice(invoice_id):
    return Invoice.query.get(invoice_id)

# Safe — scope by current_user
@app.route("/api/invoices/<int:invoice_id>")
@login_required
def get_invoice(invoice_id):
    inv = Invoice.query.filter_by(
        id=invoice_id,
        user_id=current_user.id    # the gate
    ).first_or_404()
    return inv

The pattern fails most often at the framework level. Some ORMs let you scope queries globally ("every query for Invoice must include a user_id filter") — turn that on if your framework supports it. Some authorization libraries (Cerbos, OPA, Cedar) enforce policy at a layer above the ORM and refuse to return rows that violate policy. Either approach is stronger than relying on every developer remembering the filter.

Related: missing authorization on state-changing endpoints. POST /api/admin/... must check the user has admin scope, not just that they are logged in. The "is admin" gate gets forgotten more often than "user must be logged in" because it is the second check, not the first.

XSS — the original, still relevant

Cross-site scripting is injection where the interpreter is the victim's browser. User input gets written into a page without escaping, and the browser runs it as JavaScript in the victim's session. The injected script can read the page, read non-HttpOnly cookies, make requests as the user, and deface or rewrite anything on screen. XSS rolls up into the injection entry on the OWASP Top 10 for a reason — it is the same data-as-code mistake, just pointed at the DOM instead of a database.

It comes in three flavours, and the distinction changes where you defend:

Stored XSS. The malicious input is saved server-side — a comment, a profile bio, a support-ticket body — and served back to every viewer. One injection hits everyone who loads the page, which makes it the most damaging variant. A single stored payload in a high-traffic comment field can harvest sessions at scale.

Reflected XSS. The input is bounced straight back in the response, usually from a query parameter echoed into the page (a search box that prints "no results for your-term"). It needs a victim to click a crafted link, so it is delivered by phishing rather than stored, but the effect once clicked is identical.

DOM-based XSS. The injection never touches the server. Client-side JavaScript reads from a source the attacker controls (location.hash, document.referrer) and writes it into a sink such as element.innerHTML or document.write. Server-side escaping cannot help here because the dangerous flow lives entirely in the browser; you defend at the sink.

The modern defense layer applies to all three:

Output encoding, in the right context. Escaping is contextual: HTML body, HTML attribute, JavaScript string, URL, and CSS each need different encoding. Every modern templating engine HTML-escapes by default (React JSX, Svelte, Jinja, Handlebars, Vue), so if you write {user.name} the engine encodes it for you. The dangerous APIs are the ones that opt out — dangerouslySetInnerHTML in React, @html in Svelte, {{{ raw }}} in Handlebars, and any direct innerHTML = assignment. Audit every use of these, and when you must render user-supplied HTML, run it through a sanitiser such as DOMPurify rather than trusting your own filter.

Content Security Policy. A CSP header tells the browser "only run scripts from these origins; never run inline scripts unless they carry this nonce." An XSS bug becomes far less useful if the injected script cannot execute. A strict CSP built on nonces or hashes (rather than long origin allow-lists) is the second line of defense and catches the XSS bugs that slip past encoding. It does not replace encoding; the two layers cover each other's gaps.

HttpOnly + Secure + SameSite cookies. If your session cookie cannot be read by JavaScript, an XSS bug cannot exfiltrate it directly. It does not solve everything — the injected script can still act as the user from inside the page — but it removes the simplest session-theft path and is free to set.

CSRF — riding the user's own session

Cross-site request forgery is the inverse of XSS. The attacker does not inject script into your site; they get the victim's own browser, while logged into your site, to send a request the victim never intended. Because the browser attaches the session cookie automatically to any request bound for your domain, a hidden form or image tag on the attacker's page can fire a state-changing action — transfer money, change an email, delete an account — using the victim's credentials.

CSRF: the attacker's page makes the victim's authenticated browser fire a request to the bank. The session cookie is attached automatically, so the request looks legitimate.

The defenses, layered:

SameSite cookies. Setting your session cookie to SameSite=Lax (or Strict) tells the browser not to send it on cross-site requests, which kills the classic CSRF vector for top-level navigations and form posts. Lax is a sensible default for session cookies; Strict is stronger but breaks some legitimate cross-site link flows. This single attribute removes most CSRF risk for free.

Anti-CSRF tokens. The server embeds a random, per-session (or per-request) token in each form and rejects any state-changing request that does not echo it back. The attacker's page cannot read the token — the same-origin policy stops it — so it cannot forge a valid request. The synchroniser-token pattern is the standard server-rendered defense; the double-submit-cookie pattern is the common variant for APIs that hold no server-side session state.

Verify the request actually came from your site. For APIs, requiring a custom header (which cross-site forms cannot set without a CORS preflight) or checking the Origin header on state-changing requests adds a cheap second gate. Note that GET requests must stay side-effect-free for any of this to hold — a CSRF token on a form means nothing if a logged-in GET /delete?id=5 also works.

Secrets exposure — the leak every breach depends on

Almost every serious breach eventually turns on a credential that should not have been reachable. An API key committed to a public repo, a database password baked into a Docker image layer, an access token printed into a log, a .env file served because a static handler was pointed at the wrong directory. None of these is exotic; they are the quiet, recurring shape behind the headline incidents. Secrets exposure shows up on the OWASP Top 10 under security misconfiguration and cryptographic failures, and it is the most boring class to fix and the most expensive to get wrong.

# How secrets leak, in order of how often it happens
# 1. Committed to git — survives in history even after you delete the line
git log -p | grep -i "api_key\|password\|secret"

# 2. Baked into a container image layer (visible to anyone who pulls it)
#    Use build secrets / runtime injection, never COPY .env into the image

# 3. Logged — tokens echoed in request dumps or error traces
logger.info(f"calling api with {headers}")   # don't; redact first

# 4. Returned in an API response or error message (stack traces, debug=True)

The fixes are mechanical and cheap once you commit to them:

Keep secrets out of source entirely. Inject them at runtime from a secrets manager (Vault, AWS Secrets Manager, GCP Secret Manager, the platform's environment injection). The code should read a name, not a value. The dedicated secrets management chapter covers the full pattern.

Scan before you push. A pre-commit hook (gitleaks, trufflehog) and a CI scanner catch the accidental commit before it reaches a remote. Assume any secret that lands in git history is burned and must be rotated, not just removed.

Rotate, and make rotation cheap. Short-lived, automatically rotated credentials limit the blast radius of any leak. A secret that lives for an hour is far less valuable to an attacker than one that lives for three years. The authentication chapter covers how token lifetimes and rotation interact with sessions.

Vulnerable dependencies and supply chain

Most of the code in your application is not yours. A typical service pulls in hundreds of direct and transitive dependencies, and a known vulnerability in any of them is your vulnerability the moment it ships. This is its own OWASP Top 10 entry — vulnerable and outdated components — and it sits alongside software-and-data integrity failures, which covers the supply-chain attacks where the dependency itself is hostile.

The two shapes to defend against:

Known-vulnerable versions. A CVE is published against a library you depend on; until you upgrade, you are exposed. The Log4Shell incident (CVE-2021-44228) was this shape at planetary scale — a logging library nearly everyone used had a deserialization path that turned a logged string into remote code execution. The fix is to know your bill of materials and watch it. Run a dependency scanner (Dependabot, Snyk, npm audit, pip-audit, OWASP Dependency-Check) in CI, generate an SBOM so you can answer "are we affected" in minutes rather than days, and keep dependencies current enough that the upgrade path is a patch bump and not a rewrite.

Hostile packages. An attacker publishes a malicious package, or compromises a real one, and your build pulls it in. Typosquatting (a package one keystroke off a popular name), dependency confusion (a public package shadowing your internal one), and compromised maintainer accounts are the usual routes. Defenses: pin exact versions and commit a lockfile so builds are reproducible, verify integrity hashes, prefer packages with many maintainers and recent activity, and treat a postinstall script in a new dependency as something to read before you run. Where it matters, run builds in a sandbox with no network and no secrets so a malicious install script has nothing to reach.

Broken access control — the number-one risk

Broken access control is the single largest category on the OWASP Top 10, and IDOR plus missing authorization checks are its two most common faces. The IDOR section above is the read shape; the write shape is just as common and often worse. Both come from the same gap: the code authenticates ("you are logged in") but forgets to authorize ("you may touch this specific thing"). It is the number-one risk precisely because the missing check is invisible in a passing test — the happy path works for the owner, and nobody writes the test where a stranger tries the same id.

Beyond IDOR and the missing admin check, the same class includes: trusting a hidden form field or a client-supplied role, allowing an action because the UI hid the button (the API still accepts it), and failing to re-check authorization after the first request in a multi-step flow. The durable fix is structural — deny by default, enforce authorization in one place rather than scattering checks through every handler, and scope every data access to the authenticated principal at the query layer so a forgotten check fails closed instead of open. The authentication chapter goes deeper on how sessions and identity feed these checks, and the API auth protocols chapter covers how tokens and scopes carry authorization across service boundaries.

The five rules that catch most of this

Generalising across the classes above:

1. Separate data from instructions. Parameterise queries, pass argv as a list, use safe deserialisers, escape templates. The most common defense.

2. Allow-list, do not deny-list. Specify what is allowed (image hostnames, column names, file extensions, roles). Reject everything else. Deny-lists always miss a case.

3. Normalise before checking. Resolve paths to canonical form, decode URL encoding, lowercase where case does not matter. Run security checks on the normalised form so attackers cannot bypass via encoding tricks.

4. Bound time and memory on any operation that processes untrusted input. Regex timeouts, query timeouts, JSON depth limits, request body size limits, decompression size limits. Defeats the entire denial-of-service family.

5. Authorize every read, not just the first. Authentication ("you are logged in") is not authorization ("you may see this row"). IDOR survives because engineers check the former and forget the latter.

A working app that gets these five right catches the vast majority of vulnerabilities before they become CVEs. Pattern-matching on the shape — "is this input becoming part of an instruction stream somewhere" — is the skill the chapter is trying to install.

The CVE classes you actually meet

The one pattern that connects them all

SQL injection — the patriarch

Command injection — same shape, different parser

Deserialization — instructions disguised as data

SSRF — the most underestimated class

Prototype pollution — JavaScript's special case

Path traversal — the .. that gets you

ReDoS — the regex that hangs your service

Auth bypass — IDOR and missing checks

XSS — the original, still relevant

CSRF — riding the user's own session

Secrets exposure — the leak every breach depends on

Vulnerable dependencies and supply chain

Broken access control — the number-one risk

The five rules that catch most of this

Further reading

Threat modeling for engineers