Day-0 → Month-3 · curriculum
Study path · API design

API design,
every wire format.

An API is a contract. The format is just bytes — JSON, Protobuf, Thrift, CBOR — and any of them work. The hard parts are the boundaries you draw, the changes you can make later without breaking last year's clients, and the cost of every request. The mental models, the RFCs, and the labs that get you there.


Why APIs matter.

A function call inside a process is a contract you can change at compile time — every caller is recompiled, every type is checked. An API call across a network is a contract you can change at most once a year, while millions of clients you do not control keep calling the version you shipped last March. That asymmetry is the whole game.

Roy Fielding's 2000 dissertation gave us REST: six architectural constraints that turn HTTP into a uniform interface. Most "REST APIs" only follow some of them, and that's usually fine — the word has drifted to mean "HTTP+JSON RPC over nouns", which is a perfectly good way to ship product. But under the surface, every successful API — REST, gRPC, GraphQL, anything — solves the same four problems: what's the shape, what's the wire, how does it evolve, and what does retry mean.

The contract outlasts the code. The Stripe API is a single date-pinned surface from 2011 to today. Twilio's older endpoints predate iPhone 4. The wire format you ship in 2026 will outlive at least three rewrites of the implementation; design the surface first, the implementation second.

Twelve mental models.

Twelve concepts cover ~95% of API surface. Get these in your bones in the first month; every API style you encounter (REST, gRPC, GraphQL, JSON-RPC, AsyncAPI) is a recombination of them.

01 Resource modelling Day-zero

Nouns, not verbs. /users/42, not /getUserById?id=42. The single biggest lever in REST design — once your domain is a graph of resources, idempotency, caching, and pagination fall out for free.

02 HTTP verbs & semantics Day-zero

GET reads, PUT replaces, PATCH amends, POST creates or acts. Idempotent vs safe vs neither. Every retry policy in your stack depends on you getting these right.

03 Status codes Day-zero

2xx success, 3xx redirect, 4xx your fault, 5xx mine. 422 vs 400, 401 vs 403, 409 vs 412. Returning 200 with {"error":"..."} in the body is the original sin.

04 Wire formats Practitioner

JSON for the public web, Protobuf for binary efficiency, Thrift for pluggable transport, MessagePack/CBOR for JSON-shaped binary. The format determines parse cost, schema evolution, and debug ergonomics.

05 Schema evolution Practitioner

Add fields with new numbers; never reuse a tag; never change a type. Protobuf field numbers, Avro readers/writers, OpenAPI semver. The rules that let you ship without breaking yesterday's clients.

06 Idempotency Practitioner

A retry must not double-charge, double-send, or double-create. Idempotency-Key on every state-mutating POST. The non-negotiable feature for anything that touches money.

07 Pagination Practitioner

Cursor over offset, always. Offset gets slow at page 1000 and gives wrong answers under writes. Encode cursors as base64-JSON of (last_id, last_sort_value).

08 Versioning strategies Platform

URI (/v1/), header (Accept: vnd.acme.v2), date-based (Stripe-Version: 2024-04-10). Date-based scales best for long-lived APIs; URI is simplest. Pick one, document it, never run two.

09 Authentication Platform

API keys for the first ten minutes; HMAC-signed for money; OAuth 2.0 + PKCE for users; mTLS for service mesh and B2B. JWT is a token format, not an auth scheme.

10 Rate limiting Platform

Token bucket per identity, not per IP. Surface the budget via X-RateLimit headers. Send 429 with Retry-After. Document the bursts; clients tune to them.

11 Streaming & push Practitioner

SSE for one-way (AI tokens, log tail, dashboards). WebSockets for full-duplex (chat, games). gRPC bidi for server-to-server streams. Webhooks for cross-org callbacks.

12 Observability Operator

Request-id propagation, traceparent (W3C), Server-Timing, structured logs per route template. Without these you cannot debug at scale; with them, the API documents itself.

Day zero — first hour.

One hour. Read RFC 9110 sections 9 and 15 (HTTP methods and status codes). Open Stripe's API reference. Then pick one of your own endpoints and write its OpenAPI fragment. The bar is muscle: you have read the canonical HTTP semantics and you have authored a machine-readable contract for one real endpoint.

# 1. Read RFC 9110 §9 (methods) + §15 (status codes) (≈ 30 minutes)
#    https://datatracker.ietf.org/doc/html/rfc9110

# 2. Tour Stripe's API reference (≈ 20 minutes)
#    https://stripe.com/docs/api
#    Note: <a href="/codex/distributed-systems/topics/idempotence">idempotency</a> keys, expandable objects, error codes,
#    cursor pagination, the version header.

# 3. Write OpenAPI for one endpoint you own (≈ 20 minutes)
#    Use the Swagger Editor — it lints as you type.
#    https://editor.swagger.io
openapi: 3.1.0
info: { title: Orders, version: '2024-08' }
paths:
  /v1/orders/{id}:
    get:
      operationId: getOrder
      parameters:
        - { name: id, in: path, required: true, schema: { type: string } }
      responses:
        '200': { $ref: '#/components/schemas/Order' }
        '404': { $ref: '#/components/responses/NotFound' }

# 4. Generate a client; call your real endpoint
npx @openapitools/openapi-generator-cli generate \
    -i openapi.yaml -g typescript-fetch -o ./client

Done. You have read the right two RFC sections, studied the gold-standard public API, and written a real OpenAPI doc. Everything below extends from this beachhead.

Week 1 to Month 3 — pick a track.

After the first hour you can read API design writing without bouncing off it. The next three months should be one track at a time, depth-first. Don't try to learn gRPC and GraphQL in the same fortnight; pick the one that maps to your job and finish it.

REST done well

Resource modelling, verb semantics, status codes, idempotency keys, conditional requests, ETags. Roy Fielding's dissertation chapter 5; Mark Massé's "REST API Design Rulebook"; the Heroku and Stripe public APIs as worked examples.

→ Reference
gRPC + Protobuf

HTTP/2 framing, the four streaming modes, deadlines, interceptors, Protobuf wire format, schema evolution rules. Read the gRPC core docs and the Protobuf encoding guide; build a tiny chat-style bidi service to make it real.

→ Reference
GraphQL & federation

Schemas, resolvers, the N+1 problem, DataLoader, persisted queries, edge caching. Apollo Federation for multi-team graphs. The Production Ready GraphQL book (Marc-André Giroux) and the Shopify engineering posts.

→ Reference
Real-time & streaming

WebSockets vs SSE vs long-polling vs WebTransport. The HTML EventSource spec; RFC 6455; the Phoenix Channels and Centrifugo design notes. Pick a real use-case (presence, log tail, chat) and ship it twice — once each over SSE and WebSockets.

→ Reference
Webhooks done safely

At-least-once delivery, exponential retries, HMAC signatures, replay defense, the Standard Webhooks spec. Stripe, GitHub, and Shopify all publish their delivery semantics — read three; build one.

→ Reference
Auth, the long form

API keys → HMAC SigV4 → OAuth 2.0 + PKCE → mTLS / SPIFFE. Read the RFC 6749 + 7636 pair end to end; then read the OAuth threat model (RFC 6819). Stand up Hydra or Authlib locally.

→ Reference
Versioning & evolution

URI vs header vs date-based. Stripe's deprecation playbook (Sunset header, brownouts, dual-write). Protobuf field rules. Avro reader/writer compatibility. Buf's breaking-change linter is the right CI gate.

→ Reference

Books worth reading.

2018 · O'Reilly
Brenda Jin, Saurabh Sahni, Amir Shevat — Designing Web APIs

Slack's book. Resource design, errors, pagination, async, the whole interview. Light on theology, heavy on "this is what we shipped and what we wish we had done differently". The right starter book for app-builders.

2022 · Manning
Arnaud Lauret — The Design of Web APIs (2nd ed.)

Lauret's OpenAPI-first methodology. Walks an API from problem statement through OpenAPI doc through implementation. The book to hand a junior engineer who is about to ship their first public endpoint.

2021 · Manning
Marc-André Giroux — Production Ready GraphQL

The most operational GraphQL book. N+1, persisted queries, schema versioning, federation pitfalls, security. Skip the marketing material; read this if you're actually running a GraphQL API.

2020 · O'Reilly
Daniel Bryant, Russ Miles — Mastering API Architecture

API gateways, service meshes, OpenAPI, gRPC, asynchronous messaging. Reasonably technology-agnostic; pairs well with the SRE book's service-design chapter.

2020 · self-published
Mark Massé — REST API Design Rulebook

A 100-page set of normative rules. Some are debatable, some are gold. The discipline of having a rulebook at all is the right starting point — adopt 60% of his rules and write down which 40% you broke.

2019 · O'Reilly
Olaf Hartig, Jorge Pérez — GraphQL in Action

A solid GraphQL primer. Less opinionated than Giroux's book; better for first-time-with-GraphQL learners. Chapter on subscriptions is worth the price alone.

2014 · Pearson
Sam Ruby, Mike Amundsen, Leonard Richardson — RESTful Web APIs

The "Richardson Maturity Model" book. Hypermedia, profiles, ALPS, content negotiation. Classic-leaning; read it once for the conceptual frame, then come back to it when designing your own media types.

Honourable mentions: The OpenAPI Handbook (Frank Kilcommins, free); API Security in Action (Neil Madden — the only good book on API auth threat models); Designing Distributed Systems (Burns) for the platform-team operational frame.

Courses and references.

Free
Paid (worth it)

Papers worth reading.

Twelve documents — half RFCs, half engineering essays — that define modern API design. Read them in order; read each twice if you can. Most are 10–30 pages.

  1. 01
    2000 · Roy Fielding
    Architectural Styles and the Design of Network-based Software Architectures (Ch. 5: REST)

    The dissertation. Chapter 5 alone is the definition of REST — six constraints, the uniform interface, hypermedia as the engine of application state. Most "REST APIs" follow about half of it; that's usually fine, but you should know which half.

  2. 02
    2014 · IETF
    RFC 7230–7235 — HTTP/1.1 (revised)

    The current normative HTTP/1.1 spec, in six parts. Read 7230 (message syntax) and 7234 (caching) at minimum; everything else is reference. RFC 9110/9112 (2022) supersede most of these as the unified semantics-and-message spec.

  3. 03
    2015 · Belshe, Peon, Thomson
    RFC 7540 — HTTP/2

    Binary framing, multiplexed streams, HPACK header compression, server push (now deprecated). The substrate every modern HTTPS connection rides on; the substrate gRPC was designed for.

  4. 04
    2018 · IETF
    RFC 8446 — TLS 1.3

    The TLS rewrite. Forward secrecy by default, 1-RTT handshake, 0-RTT resumption with care. If your API requires a transport-security mental model, this is the floor.

  5. 05
    2017 · Sambra et al
    Solid: A Platform for Decentralized Social Applications

    Tim Berners-Lee's linked-data social platform. Worth reading not for adoption (low) but for the design discipline — what an API looks like when "data ownership" is the first constraint.

  6. 06
    2018 · Berjon et al
    JSON:API v1.1

    The most thoughtful prescriptive REST style spec. Resource objects, relationships, sparse fieldsets, sort/filter conventions, pagination. Even if you don't adopt JSON:API, the conventions are worth borrowing one at a time.

  7. 07
    2019 · OpenAPI Initiative
    OpenAPI Specification 3.0

    The de-facto API description language. Swagger's grown-up form. If your team writes OpenAPI from day one and treats it as the contract, half the typical API problems vanish (client codegen, mock servers, contract testing).

  8. 08
    2020 · IETF
    RFC 8594 — The Sunset HTTP Header

    Three pages. The right way to deprecate an endpoint. Stripe formalised this in production years before; the RFC catches up. Use it.

  9. 09
    2022 · IETF
    RFC 9110 — HTTP Semantics

    The unified normative HTTP semantics doc. Replaces the patchwork of 7230–7235. Read sections 9 (methods) and 15 (status codes) and you have the canonical reference for "what does GET/PUT/POST mean" forever.

  10. 10
    2023 · IETF
    RFC 9457 — Problem Details for HTTP APIs

    The right error envelope. type / title / status / detail / instance. Every API should ship this shape and a stable error code; clients should branch on the code, not the message.

  11. 11
    2018 · Google
    API Improvement Proposals (AIPs)

    Google's public API style guide, written as numbered design notes. Read AIP-122 (resource names), AIP-132 (Standard methods), AIP-158 (pagination), AIP-217 (List filtering), AIP-200 (errors). Easily the most opinionated, well-argued public API guide in existence.

  12. 12
    2008 · Pat Helland
    Life Beyond Distributed Transactions

    Helland's argument that everything important happens at the seams between systems — and APIs ARE those seams. Reframes idempotency, partial failure, and message-passing as architectural primitives, not implementation details.

Going further: RFC 7235 (HTTP authentication), RFC 6749 (OAuth 2.0) and RFC 7636 (PKCE), RFC 7519 (JWT), RFC 6455 (WebSocket protocol), The Twelve-Factor App (Wiggins), and the Standard Webhooks spec from Svix.

Talks worth watching.

Hands-on tools.

Theory without runnable artefacts is fragile. Each of these is a tractable way to make an API design choice and watch it push back when you make a mistake.

EnvironmentCostBest for
Swagger Editor + OpenAPI 3.1FreeAuthor OpenAPI in the browser; lints as you type. Generate clients via openapi-generator. The single fastest path from "API idea" to "callable client SDK".
Buf · breaking-change linterFree, open-sourcebuf breaking detects every backwards-incompatible Protobuf change as a CI check. Set this up the day you ship your first .proto; it catches the mistakes that bite you in 6 months.
grpcurl + ReflectionFree, open-sourceThe curl for gRPC. With Reflection enabled, grpcurl introspects the schema live; you can poke any gRPC service from a shell. Ship Reflection on dev/staging; gate it behind auth in prod.
Apollo Studio + GraphiQLFree tierThe dev environment for GraphQL. Schema explorer, persisted-query store, performance tracing per resolver. Even outside Apollo's runtime, the schema-registry workflow is worth borrowing.
Hurl + Bruno (or Postman)Free, open-sourcePlaintext request/response files checked into git. The right tool for "test my API by hitting it" — version-controlled, code-reviewable, replayable in CI.
httpbin / postman-echoFreePublic services that echo your request back as JSON, including the headers and body. Indispensable for sanity-checking client SDKs and proxies before you point at production.

Wire formats — side by side.

Eight formats, four properties. The size and parse columns are relative to JSON; the schema column tells you whether you can debug with cat.

Format Size on the wire Parse cost Schema Sweet spot
JSON Baseline Slow Optional (JSON Schema) Public APIs, debugging, low-volume internal
Protobuf 3–10× smaller 5–20× faster Required (.proto) gRPC, Kafka, internal high-throughput
Thrift (Compact) ~Same as Protobuf ~Same as Protobuf Required (.thrift) Meta / Twitter ecosystems; pluggable transport
MessagePack ~30–50% smaller than JSON Faster than JSON Optional JSON-shaped data over the wire, no schema cost
CBOR ~30–50% smaller than JSON Comparable to MessagePack Optional (CDDL) IoT, WebAuthn, COSE-signed payloads
FlatBuffers Largest of the binaries Zero-copy Required (.fbs) Game state, mmap'd files, ultra-low-latency
Avro ~Same as Protobuf Fast (schema-required) Required (.avsc, schema travels) Hadoop/Kafka pipelines with Schema Registry
BSON Larger than JSON Faster than JSON None MongoDB internal storage

Defaults: public APIs → JSON; gRPC → Protobuf; Kafka pipelines → Avro (with Schema Registry) or Protobuf; IoT / WebAuthn → CBOR; games / mmap'd files → FlatBuffers. Do not pick a binary format for a public API.

Common mistakes.

Patterns every team writes at least once. Read these now; recognise the shape later, when something on-call is misbehaving and the dashboard is unhelpful.

200 OK with {"error": "..."} in the body
The original sin. Use real status codes — 4xx for client errors, 5xx for yours, plus an RFC 9457 problem-details body. Tooling, retries, and clients all assume the status code is the truth.
Tunneling everything through POST
Verb-style URLs (POST /createUser, POST /getUser?id=42) sacrifice idempotency, cacheability, and HTTP semantics. Resources are nouns; verbs are verbs; the matrix mostly works out.
Reusing Protobuf field numbers
A renumbered field looks like its old type to old clients — silent decode corruption. Always reserve field numbers and field names after a delete: reserved 4, 7; reserved "old_field";
Offset pagination at scale
?page=1000 forces the database to scan and skip 50,000 rows. Worse, inserts during paging produce duplicates and gaps. Cursor-paginate from day one — the cursor cost is one extra opaque string.
JWT with HS256 + RS256 acceptance
If your verifier accepts both algorithms, an attacker swaps your public key into an HS256 token and forges admin tokens. Pin the algorithm. Reject alg: none always.
No idempotency on payment endpoints
Network retry → double charge. Stripe-style Idempotency-Key headers cost ~50 lines of code (key + hash table + 24h TTL) and prevent the worst class of production incidents.
Versioning by Content-Type AND URL AND custom header
Pick one strategy. Two is a smell, three is unmaintainable. The team that "supports all three for flexibility" cannot tell which one the client actually used.
Webhooks without HMAC signing
Anyone with the URL can fire fake events. Sign on raw bytes (not parsed JSON); enforce a 5-minute timestamp window; constant-time-compare. Use the Standard Webhooks header set.
Synchronous fan-out for N integrations
When checkout calls 10 webhooks inline, p99 = max(p99 of each). Move webhooks to a queue. Return to the user immediately; deliver the events with retries.
Surfacing internal IDs as sequential integers
GET /orders/12345 leaks volume and enables enumeration attacks. Use UUID/KSUID; prefix with the resource type (ord_, cus_) for grep-ability and leak-detection.

Quick test.

Ten cards covering the questions interviewers ask, the things that bite operators in production, and the trivia that separates "I write APIs" from "I design them".

Card 1 of 10
A POST endpoint creates a charge. The network drops; the client retries. What do you need to do to prevent double charges?
Suggested sequences

Reading progressions

Three ordered paths through this material — pick the one that matches where you are.

Path 01 · REST
HTTP & REST fundamentals

The protocol layer first, then the architectural style on top of it.

  1. How HTTP Works
  2. HTTP Flow Simulator ↗
  3. API Gateway — edge enforcement
  4. HTTPS / TLS — securing the channel
  5. HTTP Caching — cache-control semantics
Path 02 · gRPC & binary
Beyond REST — gRPC & Protobuf

When REST's constraints become friction and binary protocols earn their complexity.

  1. gRPC vs REST Simulator ↗
  2. JSON vs Protobuf Simulator ↗
  3. HTTP/2 Streams Simulator ↗
  4. WebSockets — for streaming APIs
Path 03 · Auth & security
Authentication & authorization

How modern auth is built on top of HTTPS, tokens, and open standards.

  1. OAuth 2 — delegated authorization
  2. OIDC — identity tokens
  3. JWT Lifecycle Simulator ↗
  4. API Gateway — enforcing auth at the edge

What's next.

API design rewards re-reading. RFC 9110 read on day 30 and again on day 300 will give you different things. So will the Stripe API reference. So will Pat Helland's QCon talk. The field is not large; it is dense and citation-heavy and has been compounding since roughly 1996.

Pick one real public API and read its reference end to end. Stripe, GitHub, Twilio, AWS S3, Slack — all open and well-documented. Then re-read your own API and rewrite the parts that embarrass you. You will rewrite some.