11 / 11
Protocols / 11

API design best practices

Most of the decisions that determine how an API ages are not about the protocol you pick. They are the cross-cutting choices every endpoint shares: how resources are named, which HTTP method does what, what an error looks like on the wire, how lists paginate, how a client retries safely, and how rate limits surface. None of this is hard. A small amount of deliberate design here saves years of cleanup later, because a public API is the one part of a system you cannot quietly refactor once people depend on it. This page walks the whole set with the depth a senior engineer needs to make the calls and defend them in review.


Resource naming and URL design

A URL is the most permanent thing you publish. Long after you have rewritten the handler, swapped the database, and changed teams twice, the path is still in someone's code. So spend the design effort here. The rule that travels furthest: model your API as a set of nouns (resources) and let HTTP verbs do the acting. A charge, a customer, an invoice — those are resources, each with a stable identity and a predictable address. The verb you reach for is the HTTP method, not a word in the path. POST /charges creates a charge; GET /charges/ch_001 reads one. You should almost never need a path like /createCharge or /getChargeById, because the method already carries that meaning.

A few conventions are worth holding the line on, because mixing them inside one API is the fastest way to make it feel cheap:

  • Plural collection nouns. /charges, not /charge. The collection is a list; an item is addressed by id underneath it. Pick plural and never deviate.
  • Lowercase, hyphenated, no file extensions. /payment-methods, not /PaymentMethods or /payment_methods.json. The response format belongs in the Accept header, not the path.
  • Nest only to show ownership, and only one level. /customers/cus_01/payment-methods reads well. Three or four levels of nesting become impossible to route and impossible to remember; once a child has its own stable id, give it a top-level path too.
  • Identifiers are opaque. Treat ids as strings the client never parses. Prefixed ids (ch_, cus_) are a small kindness: a log line tells you what kind of thing went wrong without a schema lookup.

The same idea extends to the few cases that are not CRUD. A "refund this charge" action is not a field you PATCH; it is a sub-resource you create: POST /charges/ch_001/refunds. Treating the action as a thing that gets created keeps the model consistent and, as a bonus, gives the action its own id, its own audit trail, and a natural place to attach an idempotency key. Reserve true RPC-style verb endpoints for the handful of operations that resist the noun model, and name them honestly when you do.

HTTP methods and status codes

HTTP already encodes a great deal of intent. Using it correctly means clients, proxies, caches, and your own retry logic all behave the way they were built to without extra configuration. The methods split along two axes that matter for correctness: whether they change state (safe), and whether running them twice has the same effect as running them once (idempotent).

MethodUse forSafeIdempotent
GETRead a resource or collection. Never mutate.yesyes
POSTCreate a resource, or trigger an actionnono
PUTReplace a resource wholesale at a known idnoyes
PATCHApply a partial updatenono*
DELETERemove a resourcenoyes

Those properties are a contract, not trivia. A GET must be free of side effects, because browsers prefetch them, proxies cache them, and crawlers follow them. A PUT that replaces a resource at a known id is idempotent, so a client that times out can simply send it again. A naive POST that creates a resource is not idempotent, which is the whole reason idempotency keys exist further down this page. (PATCH is idempotent only if the patch itself is, such as "set status to closed"; a relative patch like "add 5 to the balance" is not, which is a good reason to prefer absolute updates.)

Status codes deserve the same care. Clients branch on them, alerting systems count them, and retry libraries decide whether to try again based on the class. The full list is large; the set you actually reach for is small.

CodeMeaningWhen
200 / 201 / 204OK / Created / No ContentSuccess. 201 on create, 204 when there is nothing to return
400Bad RequestThe request is malformed or fails validation
401 / 403Unauthorized / ForbiddenNot authenticated, versus authenticated but not allowed
404Not FoundNo such resource, or you are hiding its existence on purpose
409ConflictState clash: duplicate, version mismatch, idempotency-key reuse
422Unprocessable EntityWell-formed but semantically invalid
429Too Many RequestsRate limited; pair with Retry-After
500 / 503Server Error / UnavailableYour fault. 503 when it is temporary and worth a retry

The two distinctions people most often get wrong are 401 versus 403 (who you are versus what you may do) and 400 versus 422 (the bytes are wrong versus the meaning is wrong). Getting them right lets a client write one error handler instead of a special case per endpoint. And one rule that prevents a class of outages: a 4xx means "do not retry this as-is," a 5xx and 429 mean "you may retry, ideally with backoff." If your server returns 400 for a transient internal failure, well-behaved clients will give up on something they should have retried.

One status code per outcome. Do not return 200 with an error in the body ("status": "error") for a request that failed. Tooling, dashboards, and retry logic all read the HTTP status first. A successful-looking response that actually failed is the bug that hides for months.

Pagination — cursors over offsets

Offset pagination (?page=5&per_page=20) feels natural but breaks under load. The database has to skip the first N rows for every page, which gets expensive past a few hundred thousand. And if rows are inserted or deleted between page loads, items shift across pages — users see duplicates and missing entries.

Cursor pagination uses an opaque token that encodes "where I left off". The server decodes it, queries past that position, and returns a new cursor for the next page. The difference is not a micro-optimisation; it is the line between a list endpoint that stays fast at the millionth row and one that gets slower the deeper anyone scrolls.

OFFSET 100000 LIMIT 20 — scan then throw away100,000 rows read and discarded20 keptWHERE id > 'ch_020' LIMIT 20 — seek to the boundaryindex seek, 20 rows, depth does not mattercursor = base64( "id": "ch_020" ) ← opaque to the clientGET /charges?limit=20&cursor=eyJpZCI6ImNoXzAyMCJ9→ data: [...] next_cursor: "eyJpZCI6ImNoXzA0MCJ9" has_more: true
Offset reads and discards everything before the page; a cursor seeks straight to the boundary and reads only what it returns.
# request
GET /charges?limit=20

# response
{
  "data": [ { "id": "ch_001" }, ..., { "id": "ch_020" } ],
  "next_cursor": "eyJpZCI6ImNoXzAyMCJ9",
  "has_more": true
}

# next page
GET /charges?limit=20&cursor=eyJpZCI6ImNoXzAyMCJ9

The cursor is just {"id": "ch_020"} base64-encoded — but treat it as opaque to the client so you can change the encoding later. The server-side query becomes WHERE id > 'ch_020' ORDER BY id LIMIT 20, which is index-cheap at any depth. Two details make a cursor scheme correct rather than merely fast. First, the sort must be on a unique, monotonic column, or a column plus a unique tiebreaker (sort by created_at, break ties on id); otherwise two rows with the same timestamp can straddle a page boundary and one gets skipped or repeated. Second, encode everything the query needs into the cursor — the sort field, the direction, the last seen values — so the server is stateless and you never have to remember a client's position on your side.

Offset is not always wrong. If a dataset is small and bounded, or you need to jump to "page 47" of a fixed report, offset is simpler and fine. The failure mode is reaching for offset by default on a collection that grows without limit. The honest summary: cursors for anything that scales, offset only where the total is small and the "jump to page" affordance is worth the cost.

Always return the page envelope. Even on the first request, return next_cursor and has_more rather than making clients infer the end from a short page. A page that happens to return exactly limit items on the last page is indistinguishable from a full one without an explicit flag.

Errors — RFC 9457 problem details

RFC 9457 (originally RFC 7807) defines an error envelope for HTTP APIs. Adopt it. Every error your API emits should follow this shape, with the type URI as a stable machine-readable code:

HTTP/1.1 422 Unprocessable Entity
Content-Type: application/problem+json

{
  "type":     "https://api.example.com/problems/invalid-currency",
  "title":    "Unsupported currency",
  "status":   422,
  "detail":   "The requested currency 'XYZ' is not supported.",
  "instance": "/charges/ch_001",
  "field":    "currency",
  "request_id": "req_5f9a..."
}
"type":"https://api.example.com/problems/invalid-currency""title":"Unsupported currency""status":422"detail":"The currency 'XYZ' is not supported.""field":"currency""request_id":"req_5f9a..."switch on thisshow to humanstells them the fixtrace it in logs
One envelope, every error. The machine reads type; the human reads title and detail; support reads request_id.

A good error has four properties:

  • Stable machine-readable codetype as a URI. Clients switch on this; never on title or detail.
  • Human-readable explanation — what happened and why, in plain English, safe to surface to a user or developer.
  • Actionable detail — which field, what value, what to fix.
  • A request ID — so the developer can ask you about it. See below.

The point of a stable type is that it is a promise. Once a client ships code that branches on invalid-currency, that string is now part of your contract as surely as any field name. You can improve the title and detail wording freely — they are for humans and nobody should parse them — but the type is forever. Keep a registry of your error types the same way you keep a list of endpoints, and treat removing or renaming one as a breaking change that belongs on the versioning roadmap, not a quiet patch.

For validation failures, return all the problems at once, not the first one. A form with four bad fields should come back with four entries (RFC 9457 allows an array of nested problems under an errors member), so the client can highlight every field in one round trip instead of playing whack-a-mole. And never leak internals in the detail: stack traces, SQL, and internal hostnames belong in your logs keyed by the request id, not in a response a stranger can read.

Rate limits

If your API is public, it has rate limits whether you advertise them or not — at minimum from your CDN, your reverse proxy, and from your origin's capacity. Make them explicit. Send the current state on every response:

HTTP/1.1 200 OK
RateLimit-Limit:     100
RateLimit-Remaining: 87
RateLimit-Reset:     12          # seconds until the bucket refills

When a client is over the limit, return 429 Too Many Requests with a Retry-After header indicating how long to wait. The IETF RateLimit headers draft is converging on a standard form; until then, GitHub's X-RateLimit-* shape is the most widely-implemented and a reasonable choice.

The 429 plus Retry-After pair is a contract, and both sides have to keep it. The server promises that waiting the stated time will help; the client promises to actually wait rather than hammering the endpoint. A client that retries a 429 immediately is just turning a soft limit into a hard outage. The right client behaviour is to honour Retry-After when present, and otherwise back off exponentially with jitter so a fleet of clients does not all wake up and retry in lockstep. If you want to feel the difference between a token bucket that absorbs bursts and a fixed window that drops them, the rate limiter simulator lets you turn the knobs and watch requests pass or fail in real time.

A couple of design choices matter more than the exact header names. Rate limit per principal (the API key or user), not per IP, or one customer behind a shared gateway can starve another. And decide whether you are shaping bursts or enforcing a hard ceiling: a token bucket lets a client spend a saved-up burst and is friendlier to bursty real workloads, while a fixed window is simpler but creates a stampede at the boundary of each window. Most public APIs land on a token bucket for exactly that reason.

Idempotency keys and safe retries

Networks fail in the worst possible way: the request arrives, the server does the work, and the response is lost on the way back. The client cannot tell that apart from a request that never landed, so it retries — and now you have charged the card twice. The fix is an Idempotency-Key header. The client generates a unique key per logical operation and sends it with every attempt. The server stores the result of the first request under that key for some window (24 hours is typical) and replays the same response on any retry. This is the same principle that makes distributed systems survivable; the deeper treatment lives on the idempotence page, and on the REST page in API terms.

clientserverkey storePOST /charges Idempotency-Key: k_42k_42 seen? no → reserve, do workstore response under k_42201 Created ch_001 ✦ response lost ✦retry: POST /charges Idempotency-Key: k_42k_42 seen? yes → no new charge201 Created ch_001 (replayed, same body)
The retry carries the same key, so the server replays the stored response instead of charging the card a second time.

Getting this right means thinking about the gaps between steps. The "reserve, do work, store" sequence has to be atomic enough that two retries racing each other cannot both slip through. The usual pattern is to insert the key into a unique-constrained table before doing the work: the first request wins the insert and proceeds; a concurrent duplicate hits the constraint and waits for, then replays, the stored result. Done this way the database, not your application code, enforces the at-most-once guarantee.

Two non-obvious rules:

  • Hash the request body and store the hash with the cached response. If a later call uses the same key with a different body, return 409 — that's a client bug masquerading as a retry, and replaying the old response would hide it.
  • Document the dedup window. Clients that retry beyond it (e.g. resuming a job after 48 hours) need to know to generate a fresh key, and that an old key may no longer be deduped.

Scope the key correctly too. An idempotency key is meaningful within one authenticated account and one endpoint; the same random string from two different customers must never collide. The practical key is therefore "account id plus endpoint plus the client's key," even though the client only sends the last part. And only require keys where a duplicate does harm — creating a charge, sending mail, provisioning a resource. A GET needs no key because it changes nothing, and a PUT to a known id is already idempotent by construction.

Request IDs

Every request gets a unique ID. The server logs it. The server returns it on every response (header and error envelope). The client can quote it when reporting an issue. Internal services pass the same ID through their own logs, so a single string lets you follow a single request through every system it touched.

# client may supply one; if not, server generates
Request-ID: req_5f9a4c8e7d3b2a1f8e7d6c5b4a39281

# response always echoes
HTTP/1.1 200 OK
Request-ID: req_5f9a4c8e7d3b2a1f8e7d6c5b4a39281

If you're already using W3C Trace Context, use that instead — traceparent and tracestate headers cover the same need plus distributed tracing semantics. The traceparent trace ID is itself a perfectly good request ID for log correlation.

Filtering, sorting, partial responses

Three smaller patterns worth standardising up front:

  • Filtering. Query parameters that match field names — ?status=open&currency=USD. For ranges, suffixes: ?created_after=...&created_before=.... Avoid clever DSLs; every client has to learn them.
  • Sorting. A single ?sort=-created_at,id parameter. Hyphen prefix for descending. Stable: a tiebreaker like id keeps page boundaries deterministic.
  • Partial responses. A ?fields=id,amount,status parameter that whitelists fields (a sparse fieldset). Saves bandwidth on large objects and lets you deprecate fields gracefully — clients that never asked for a field do not notice when it goes away.

The common thread is restraint. Each of these is a small, guessable surface that a client can learn once and apply to every collection. The moment you invent a bespoke query language — nested boolean operators, a custom filter grammar in a string parameter — you have built a second API inside your API that nobody can use without reading a manual, and that you now have to parse safely. If you need rich querying, that is a sign the use case wants a real query endpoint or a different protocol, not a clever string parameter bolted onto a list.

Good defaults, nullability, consistency

The fastest way to make an API feel professional is to be boringly consistent. Pick one casing for field names (snake_case or camelCase) and never mix them. Use one format for timestamps everywhere — RFC 3339 / ISO 8601 in UTC, 2026-06-07T14:30:00Z — not a Unix integer here and a date string there. Represent money as an integer count of minor units (cents) plus a currency code, never a float, because floating-point cents quietly lose money. These are not matters of taste once you have picked them; the value is that a client who has parsed one of your responses can parse all of them.

Nullability deserves an explicit policy. Decide, per field, whether absent and null mean the same thing, and write it down. The cleanest rule: omit a field that has no value rather than sending null, and reserve null for "this field exists and is deliberately empty." For collections, return an empty array, never null — a client iterating a list should never have to null-check it first. Small as it sounds, inconsistent nullability is one of the most common sources of client crashes.

Defaults are part of the contract too. If limit defaults to 20, document it and cap it, so a client cannot ask for a million rows and take the service down. If a new field is added later, give it a default that preserves the old behaviour, so existing callers see no change. The guiding idea is to favour conservative choices: when in doubt, the default should be the safe, small, backward-compatible one, and new capability should be opt-in.

Versioning and backward compatibility

You will change the API. The only question is whether your changes break the people who depend on it. The discipline that keeps an API stable is knowing which changes are safe and which are not. Adding an optional field, adding a new endpoint, adding a new enum value the client can ignore, adding a new optional query parameter — these are additive and safe. Removing a field, renaming one, tightening validation, changing a type, changing the meaning of an existing value, or making an optional field required — these break callers and need a new version.

The corollary on the client side is just as important, and worth telling your consumers plainly: read tolerantly. A client that rejects a response because it contains a field it did not expect turns every additive, safe change you make into a breakage on their end. Ignore unknown fields, treat unknown enum values as a documented "other," and you can both keep moving. The mechanics of how to express versions — in the path, a header, or a date stamp — and how to run two versions side by side, are the whole subject of the versioning page; the practice that matters here is to make additive changes the default and reserve a version bump for the rare change that cannot be additive.

OpenAPI from day one

Whatever protocol you choose for HTTP/JSON APIs, write an OpenAPI document for it from the start. Three things become possible:

  • Generated SDKs. Tools like Speakeasy, Fern, and the open-source OpenAPI Generator produce typed clients in every major language without you writing a single SDK.
  • Documentation that doesn't drift. The schema is the source of truth; the docs site reads from it. Adding an endpoint and forgetting to document it stops being a possible failure mode.
  • Server validation. Many frameworks can read the OpenAPI document and reject invalid requests at the edge before they reach your handlers.

The deeper win is treating the document as the contract rather than as documentation generated after the fact. When the OpenAPI file is the source of truth, you can diff it in code review and see, mechanically, whether a change is additive or breaking — which turns the backward-compatibility rules above from a habit you hope people remember into a check a tool runs. Whether you write the spec first and generate handlers from it, or generate the spec from annotated code, the test is the same: if the spec and the running server ever disagree, that is a bug, and it should fail a build, not surprise a customer.

Security basics that are not optional

Two security rules cut across every endpoint, and skipping either is the kind of mistake that ends up in an incident review. The first: authorise on every endpoint, every time, on the server, against the authenticated principal. Authentication answers "who are you"; authorisation answers "may you do this to this specific object." A staggering share of real breaches are not broken crypto but a missing object-level check — an endpoint that confirms you are logged in but never confirms that charge ch_001 belongs to you before returning it. The defence is to make the ownership check part of the query, not an afterthought: fetch the object scoped to the caller (WHERE id = ? AND account_id = ?) so an object you do not own simply does not exist as far as you are concerned, and you return a clean 404.

The second: keep secrets out of URLs. API keys, tokens, and session ids do not belong in the path or query string, because URLs are logged everywhere — in access logs, proxy logs, browser history, the Referer header sent to third parties. Credentials go in the Authorization header, over TLS, and nothing else. While you are at it: never reflect a secret back in an error message, never put personally identifying data in a cacheable GET URL, and apply input limits (max body size, max array length, max string length) so a single request cannot exhaust memory. None of this is exotic. It is the floor, and the audience for these notes is exactly the engineer who is expected to know it without being told.

Authorise per object, not per route. "Is this user logged in" is the easy half. "Does this user own the thing they just asked for by id" is the half that actually keeps data from leaking, and it has to live in the data access path, not a middleware that only sees the route.

A summary checklist

  • Plural, lowercase, hyphenated resource paths; nouns in the path, verbs as HTTP methods.
  • Correct method per intent; one status code per outcome, and never 200 on a failure.
  • 4xx means do not retry, 5xx and 429 mean retry with backoff.
  • Cursor pagination, never offset for collections that grow.
  • Errors as RFC 9457 problem details, with stable type URIs and no leaked internals.
  • Rate-limit headers on every response; Retry-After on 429.
  • Idempotency keys on every state-mutating POST that would do harm if duplicated.
  • Request IDs (or W3C Trace Context) on every request and response.
  • Standard filter/sort/fields parameters; document them once, reuse everywhere.
  • Consistent casing, UTC timestamps, money as integer minor units; empty arrays, never null.
  • OpenAPI document checked in, updated with every change, generating SDKs and docs.
  • Additive changes by default; version bump only when a change cannot be additive (see versioning).
  • Authorise per object on every endpoint; secrets in headers, never in URLs.

Further reading

Found this useful?