gRPC vs REST: RPC, two ways.

REST: HTTP+JSON, resource-oriented, cacheable. gRPC: HTTP/2 + Protobuf, RPC-shaped, streaming. Same call. Very different wire.

Mode
GRPC

A request to "get an order"
# .proto contract
service Orders {
 rpc Get (GetReq) returns (Order);
 rpc Stream (Filter) returns (stream Order); // server streaming
 rpc Bulk (stream BulkReq) returns (BulkResp); // client streaming
}

# Wire (HTTP/2 frame, Protobuf body)
:method = POST
:path = /Orders/Get
content-type = application/grpc+proto
te = trailers
authorization = Bearer eyJhbGc...

# Body: 5-byte gRPC framing + Protobuf body
0x00 0x00 0x00 0x00 0x07 08 c8 3f 12 04 ...

# Trailer
grpc-status: 0
When to pick what
RESTgRPC
Browser-nativeyesno (needs grpc-web proxy)
Wire sizeJSON; verboseProtobuf; compact
Latencyconnection setup per call (HTTP/1.1)multiplexed on HTTP/2 connection
StreamingSSE or WebSocketsnative uni- and bi-directional
SchemaOpenAPI (optional).proto required, codegen everywhere
Caching (intermediaries)native HTTP cachingnone
ErrorsHTTP statusStatus enum + details
Debuggingcurl, browser devtoolsgrpcurl, Wireshark

What you're looking at

One toggle, two wire dumps. Both panels show the same logical call, fetch order 8120, as the bytes that leave the machine. REST mode is the familiar exchange: a GET, readable headers, a JSON body you can scan by eye. gRPC mode shows the .proto contract first, then the wire underneath it: a POST to /Orders/Get on an HTTP/2 stream, a binary Protobuf body behind a 5-byte framing prefix, and the call status arriving in a trailer after the body. The table below scores both on the axes that decide real designs.

Flip between the modes a few times and read the gRPC panel closely. Two details should surprise you. Every gRPC call is a POST, even a pure read; the method name lives in the path, not the verb. And grpc-status rides in a trailer that follows the body, which is why gRPC needs HTTP/2 specifically and why a browser cannot speak it without a proxy. In the matrix, the row that settles most arguments is caching: REST gets CDN caching for free, gRPC gets none.

gRPC vs REST — what's the difference?

Two services, one shared API.

REST (Roy Fielding's 2000 dissertation) and gRPC (Google, 2015) are two ways to express the same thing — a service contract over HTTP. REST is verb-resource over HTTP/1.1 with JSON; gRPC is method-call over HTTP/2 with Protobuf. The simulator above runs the same logical operation through both stacks so you can see wire-size and round-trip-cost differences directly.

Imagine two services need to talk to each other. The order service has a function called create_order(customer_id, items). The shipping service wants to call it. They don't share memory, they don't share a process — they share a network. The question is what bytes to put on the wire so the order service can reconstruct the call, run the function, and return the result.

The first instinct — “serialise the call as text the way humans would” — gives you something REST-shaped. POST a JSON document to /orders; receive a JSON document back. The advantages are immediate: every language has a JSON parser, every developer can read the payload by eye, every browser can hit it with fetch(), every CDN already understands HTTP caching. The cost is hidden: a 200-byte order encodes to maybe 250 bytes of JSON, parsing eats CPU, the field names "customer_id" and "items" travel on the wire repeatedly, and there's no enforced schema — the receiver hopes the sender sent the right shape.

The other instinct — “the call is already a typed function in code, ship the typed binary” — gives you RPC. Define the message shape once in a schema file (.proto), generate typed clients and servers in every language, and put compact binary bytes on the wire. The same 200-byte order takes maybe 70 bytes of Protobuf, parses 5–10× faster, and is rejected at compile time if the sender used a typo for the field name. The cost: it's not human-readable, browsers need a translation layer (gRPC-Web), and you can't curl the endpoint to test it.

This is the REST vs gRPC trade-off. REST + JSON wins for public APIs, browser apps, partner integrations, anything where the consumer is unknown. gRPC + Protobuf wins for internal mesh traffic between services you control, where 3× wire compression and 5× CPU savings translate to real money at scale (an internal mesh moving 100 TB/month saves roughly $6,000/month on AWS egress alone). The simulator above lets you run identical request-response and streaming workloads on both protocols and watch the byte counts and CPU costs diverge.

gRPC additionally offers things REST can't natively express: four call shapes (unary, server-streaming, client-streaming, bidirectional), deadline propagation across a fan-out chain, automatic cancellation, and HTTP/2-trailer-based status codes that survive any HTTP-aware proxy. The cost is operational: load balancing HTTP/2 connections needs deliberate care, the wire format is opaque to curl, and every gRPC response is HTTP 200 — even when the call failed. The right answer is rarely “all gRPC” or “all REST”; it's gRPC inside the mesh, REST at the edge, and a Connect-style runtime when both must coexist.

SAME ORDER · JSON ON THE WIRE vs PROTOBUF ON THE WIREJSON · ~250 bytes{"id":"o7","customer_id":"c12","items":[{"sku":"a","qty":2},{"sku":"b","qty":1}],"status":"PENDING"}PROTOBUF · ~70 bytes0a 02 6f 37 12 03 63 31 32 1a 06 0a 01 61 10 021a 06 0a 01 62 10 01 20 00 ← tags + varint valuesFIELD NAMES → NUMERIC TAGS · INTS → VARINT · NO QUOTES, NO COMMAS

Origins — REST 2000, gRPC 2015, two lineages

Two lineages, one wire each.

REST and gRPC come from opposite traditions. Representational State Transfer was named in chapter five of Roy Fielding's 2000 doctoral dissertation, Architectural Styles and the Design of Network-based Software Architectures, supervised by Richard Taylor at UC Irvine. Fielding had been one of the principal authors of HTTP/1.0 (RFC 1945) and HTTP/1.1 (RFC 2068, later 2616, now 9110), and the dissertation was, in effect, a retrospective justification of the decisions baked into the web. REST is not a protocol; it is a set of constraints — uniform interface, statelessness, cacheability, layered system, code on demand, client-server separation — that, taken together, predict the scaling behaviour of the web itself.

Remote procedure call has a much older lineage. Birrell and Nelson's 1984 paper Implementing Remote Procedure Calls at Xerox PARC defined the canonical model: a client calls what looks like a local function; a stub marshals the arguments; a transport ships them; a skeleton on the server unmarshals them; the procedure runs; the return value is shipped back. Sun ONC RPC (RFC 1057, 1988) shipped this model under NFS. CORBA (Common Object Request Broker Architecture, OMG 1991) tried to do it across languages with an Interface Definition Language. DCE/RPC shipped it inside Microsoft Windows. Java RMI, SOAP, XML-RPC, Apache Thrift (Facebook 2007), Cap'n Proto RPC — every decade reinvents the same shape with a different IDL and a different wire format.

gRPC sits at the end of this lineage. Google open-sourced it in early 2015 as the public face of their internal RPC system, Stubby, which had been running inside Google for a decade. The two design choices that distinguish gRPC from its predecessors are both pragmatic. First, the wire is HTTP/2 (RFC 7540 at the time, now 9113), so gRPC traffic crosses any HTTP-aware proxy, load balancer, or service mesh without special configuration. Second, the IDL is Protocol Buffers, which had already been the de-facto serialisation format inside Google for years. The combination meant gRPC inherited a working transport on day one and a working schema language on day one.

The cultural divide between the two camps is real. REST advocates value the web's accidental properties — cacheability, debuggability, hyperlink discovery, the ability to type a URL into a browser. RPC advocates value strong typing, schema enforcement, low overhead, and the fact that the call site looks exactly like a function call. Most of the noise in the "REST vs gRPC" debate is people from one camp dismissing the values of the other.


HTTP/2 trailers and gRPC's four call shapes

HTTP/2 trailers, four call shapes.

A gRPC call is an HTTP/2 POST. The :path pseudo-header is /{ServiceName}/{MethodName}. The content-type is application/grpc+proto (or +json for the JSON variant). The body is a sequence of length-prefixed messages: a one-byte compression flag, a four-byte big-endian length, then the encoded message. The status of the call does not live in the HTTP status line — that always returns 200 once the request reached the server. Status lives in HTTP/2 trailer headers: grpc-status (a numeric code from a 17-entry enum) and grpc-message. Trailers are an HTTP/2 feature most REST traffic ignores; gRPC depends on them, which is why gRPC needs HTTP/2 specifically and cannot run on HTTP/1.1.

gRPC defines four call shapes. Unary is the familiar request-response: one message in, one message out. Server-streaming sends one request and receives a stream of responses — a paginated listing, a sequence of price ticks, the lines of a log file as they're written. Client-streaming reverses that: a stream of requests collapses to a single response, useful for batch ingestion or chunked uploads. Bidirectional streaming opens both directions at once; either side can speak whenever it has data. Because everything is multiplexed on a single HTTP/2 stream, the framing cost is identical regardless of which shape you use; the runtime simply doesn't half-close the stream until the right time.

REST has no equivalent vocabulary. The closest analogue to server-streaming is Server-Sent Events (HTML Living Standard, also known as EventSource), which streams text/event-stream over HTTP/1.1 or HTTP/2; the closest to bidirectional streaming is WebSockets (RFC 6455), which is a separate protocol piggy-backing on the HTTP upgrade handshake. Both work, neither feels native, and neither is part of the original REST architectural style — Fielding's 2000 thesis assumes pure request-response.

Two other gRPC features rarely surface in tutorials but matter in production. Deadline propagation: every gRPC call carries a grpc-timeout header, and any RPC the server initiates as part of handling that call inherits a deadline derived from the original. A request that started with a 200ms budget can't accidentally fan out into 30 downstream calls with their own 5s timeouts; the budget is consumed transitively. Cancellation is symmetric: when a client closes its side of the stream, the server's context is cancelled and any in-flight downstream calls are cancelled in turn. Both behaviours are baked into the gRPC runtime libraries; replicating them in REST requires manual plumbing.

FOUR CALL SHAPES · ONE HTTP/2 STREAM EACHUNARYreqrespSERVERreq→ N RESPONSESCLIENTN REQUESTS →respBIDIINTERLEAVED
Protocol Transport Schema Streaming Browser
RESTHTTP/1.1 or 2OpenAPI (optional)SSE / WebSocketnative
gRPCHTTP/2 (required).proto (mandatory)unary + 3 streamsgRPC-Web proxy
GraphQLHTTP POSTSDL (mandatory)subs over WSnative
tRPCHTTP / WebSocketTS types (inferred)subscriptionsnative (TS only)

Protobuf — the contract becomes the code

The contract becomes the code.

A gRPC service is defined in a .proto file. The protoc compiler reads it and emits language-specific source: a typed client stub that the caller imports as if it were any other library, and a server skeleton that the implementer fills in. Codegen exists for Go, Java, Kotlin, C++, Python, C#, Ruby, PHP, Node.js, Dart, Objective-C, Swift, and Rust (via tonic), among others. The schema is the single source of truth; every language sees the same field names, the same field numbers, the same enum values.

REST has equivalents but they are optional and arrive late. OpenAPI (formerly Swagger, now under the OpenAPI Initiative at the Linux Foundation, with a 3.1 spec aligned to JSON Schema 2020-12) describes a REST API in a JSON or YAML document; tools like openapi-generator emit clients and server stubs in dozens of languages. The catch is that OpenAPI usually documents an API after the fact; gRPC's .proto is the API. Two teams writing against the same OpenAPI document can still ship subtly incompatible clients; two teams writing against the same .proto get bit-identical wire formats by construction.

Adjacent ecosystems have explored other corners of the design space. GraphQL (Facebook, open-sourced 2015) addresses a different problem — letting the client choose which fields it wants — at the cost of HTTP cacheability and a more complex server-side resolver model. tRPC (Alex Johansson, 2020) skips IDLs entirely; if both ends are TypeScript, the type system is the schema, and the network call infers its types from the function signature. Connect (Buf, 2022) speaks the gRPC protocol from a browser without a separate gRPC-Web bridge, and adds a JSON-over-HTTP variant on the same endpoint so curl still works. Cap'n Proto RPC (Kenton Varda, 2014) goes further with promise pipelining — the result of one call can be passed as the argument to a second before the first has completed, eliminating round trips entirely on chained calls.

Choosing between these is mostly a question of how heterogeneous your fleet is, how committed you are to one language, and whether you need browsers as first-class clients. A pure-TypeScript monorepo gets the most use from tRPC; a Go/Java/Python/Rust polyglot mesh gets it from gRPC; a public web API with thousands of unknown consumers gets it from REST + OpenAPI. The differences narrow further every year as Connect, gRPC-Web, and grpc-js close the browser gap.

syntax = "proto3";
package orders.v1;

service OrderService {
  rpc CreateOrder(CreateOrderRequest) returns (Order);
  rpc ListOrders(ListOrdersRequest) returns (stream Order);
  rpc UploadEvents(stream OrderEvent) returns (Ack);
  rpc Chat(stream Message) returns (stream Message);
}

message Order {
  string id = 1;
  string customer_id = 2;
  repeated LineItem items = 3;
  Status status = 4;
  enum Status { PENDING = 0; PAID = 1; SHIPPED = 2; }
}

Service meshes speak gRPC — Envoy, Istio, Linkerd

Service meshes speak gRPC.

Inside Google, Stubby handles tens of billions of RPCs per second across the fleet, with the open-source gRPC implementation directly derived from it. Outside Google, gRPC is the lingua franca of the cloud-native data plane. Envoy (Lyft, 2016, now a CNCF graduated project) was designed from the start to terminate, inspect, and proxy gRPC; it parses HTTP/2 framing, understands gRPC trailers, and can extract the grpc-status code into its access log. Istio (2017) and Linkerd (Buoyant, originally 2016 in Scala, rewritten as Linkerd2 in Rust) build on Envoy or their own proxy to enforce mTLS, retry policies, circuit breaking, and traffic splitting on gRPC traffic.

REST is the lingua franca of public APIs. Stripe, Twilio, GitHub, Slack, Shopify, Discord, Cloudflare, every payment processor, every cloud provider control plane — they all expose REST + JSON to the world, even when their internal traffic is something else entirely. Stripe famously hand-writes its REST API documentation and publishes Open-Source SDKs in seven languages built on top of it; the SDK code is partially generated from an internal IDL but the published API is REST-shaped because that is what their customers' tools assume.

Several teams have published quantitative comparisons. Yelp's engineering blog (2019) measured a 5–10× reduction in p99 internal-RPC latency moving from JSON over HTTP/1.1 to gRPC, with most of the gain coming from connection reuse rather than serialisation. Bufbuild's benchmarks (2022) reported Connect-Go matching grpc-go on raw throughput while shaving allocations by roughly 30%. Bowman et al's 2022 paper An Empirical Evaluation of gRPC and REST Performance measured serialisation costs on equivalent payloads — Protobuf was 2–8× smaller and 3–10× faster to encode/decode than JSON across realistic message shapes. None of these benchmarks tell the whole story; they all tell some of it.

A revealing pattern: companies that started gRPC-only often add a REST facade later (Square, Uber, Dropbox), and companies that started REST-only often add gRPC for internal traffic later (Netflix, Spotify, Coinbase). The convergence point is the hybrid — gRPC inside, REST or REST-shaped at the edge — and the bridges that make it work are now mature enough that the choice is less fraught than it was five years ago.

GRPC-WEB · BROWSER → ENVOY → GRPC SERVERBrowsergrpc-webbase64+textHTTP/1.1 + trailers-in-bodyEnvoygrpc_web filtertranslationHTTP/2 + trailersgRPC servernative binaryprotobufPROXY HOP RECONCILES BROWSER + HTTP/2 GAP
The HTTP/2 status trap

Every gRPC response is HTTP 200, even when the call failed. Real status lives in the grpc-status trailer. Logging your edge proxy's HTTP status code as a success/failure signal will report 100% success while half your fleet is throwing INTERNAL or UNAVAILABLE. Always extract trailers; never trust the header status line for gRPC.


Where the abstraction leaks — browsers, proxies, load balancers

Where the abstraction leaks.

gRPC's biggest production hazard is load balancing. A naive Layer-4 (TCP) load balancer hashes on the connection 5-tuple. Once a client opens a long-lived HTTP/2 connection to a backend, every multiplexed RPC on that connection lands on the same backend — even if you scale the backend pool from one replica to ten, the existing client traffic stays glued to the original replica. Solutions exist (client-side load balancing with periodic re-resolution; Layer-7 proxies that load-balance per-request; gRPC's xDS support pushed via Envoy or proxyless via grpc-go's xDS resolver) but they require deliberate configuration. A team that "just put gRPC behind an AWS NLB" frequently discovers their fleet is unbalanced only when one host is at 95% CPU and the others are idle.

REST's biggest production hazard is the opposite — connection churn. A naive HTTP/1.1 client opens a fresh TCP connection per call, pays the three-way handshake, the TLS handshake, the slow-start ramp, and tears it down. At a few hundred QPS this is invisible; at tens of thousands of QPS the kernel runs out of ephemeral ports, the proxy's TIME_WAIT table fills, and the latency floor doubles. Connection pooling and HTTP/2 fix it but require client-side discipline. A common postmortem shape: "we migrated from internal gRPC to internal REST for debuggability, p99 latency went up 4×, root cause was connection churn under HTTP/1.1."

Both protocols share the retry hazard. A client that retries a non-idempotent call after a network blip — and the original request actually succeeded — has now duplicated whatever side effect the call had. REST sort-of solves this with idempotency keys (Stripe's Idempotency-Key header, GitHub's similar pattern); gRPC sort-of solves it with the retry service config and a server-side requirement that operations be idempotent. Neither makes the problem disappear; both surface it as a discipline question that must be answered per-method.

Streaming has its own hazards. A bidirectional gRPC stream that the client forgets to close keeps an HTTP/2 stream open; multiply by enough clients and you exhaust SETTINGS_MAX_CONCURRENT_STREAMS on the server (default 100 in many implementations). A server-streaming call where the client stops reading but doesn't cancel forces the server to buffer until the per-stream flow control window fills — at which point the server blocks, possibly for the lifetime of the deadline. Both failure shapes only emerge under load and are easy to miss in test environments.


Performance — bytes, microseconds, dollars

Bytes, microseconds, dollars.

A 200-byte JSON document — say, an order with five fields and a small array of line items — typically encodes to 60–80 bytes of Protobuf. The compression comes from three places. Field names disappear from the wire; only their numeric tags travel. Integers are varint-encoded, so values under 128 take one byte instead of the JSON text representation's two-to-five characters. Booleans, enums, and small numbers each fit in one or two bytes against JSON's roughly five ("x":true). On chunky payloads with many short numeric fields the ratio widens; on payloads dominated by long strings (text bodies, base64-encoded blobs) the ratio narrows toward 1:1.

Encoding speed differs by a similar factor. On a recent x86 server with one 500-byte message, encoding to JSON via encoding/json in Go takes about 1.5 microseconds; encoding the same message to Protobuf via google.golang.org/protobuf takes about 0.3 microseconds. Decoding shows a similar ratio. The numbers shrink by 2× if you switch to jsoniter or simdjson for JSON, and shrink by another 2× if you switch to vtprotobuf for Protobuf, but the relative gap stays roughly intact. At 100k QPS per process the difference between 1.5μs and 0.3μs is a measurable fraction of one CPU core.

Network egress costs follow payload size linearly. AWS charges roughly $0.09/GB for cross-region traffic; a service moving 100 TB/month of internal RPC traffic spends $9000/month on egress alone. A 3× reduction in payload size from JSON-over-HTTP/1.1 to Protobuf-over-HTTP/2 saves $6000/month on that one workload, and the same fraction off the receiving process's CPU. For internal mesh traffic at any meaningful scale, the financial argument tends to dominate the architectural one.

Tail latency is harder to summarise. Both protocols can deliver sub-millisecond p99 in a healthy mesh, and both can spike to seconds under retry storms or GC pauses. The signal that matters is consistency: gRPC's connection reuse and binary framing tend to give a flatter latency distribution under load, where REST over HTTP/1.1 shows a long tail driven by handshake variance. This shrinks once REST moves to HTTP/2 with connection pooling, which is increasingly the default in modern client libraries.


When to reach for each — gRPC vs REST decision matrix

When to reach for each.

A working heuristic. External API consumed by humans, browsers, partner code in unknown languages: REST + JSON, with OpenAPI as the published schema. Caching, debugging, the breadth of tooling, and the existence of the entire HTTP ecosystem — load balancers, CDNs, WAFs, observability agents — all want REST. Public Stripe, GitHub, Slack APIs are REST for excellent reasons.

Internal mesh among services you control, polyglot or single-language: gRPC. The schema is enforced at compile time, the wire is small, the streaming is native, the deadlines and cancellation propagate, the codegen produces typed clients in every language. Service meshes and observability tools speak gRPC fluently. The debuggability gap closes with grpcurl, grpc-cli, and Wireshark's gRPC dissector; not as good as curl, but adequate for production work.

Mixed: browser apps that need real-time data plus a REST surface for partners: pick a Connect-style runtime that speaks all three protocols (gRPC, gRPC-Web, JSON-over-HTTP) on the same endpoint and let the client choose. Buf's Connect framework, ConnectRPC for Swift and Kotlin, and the increasing adoption of .proto as the universal IDL are all moving the industry toward this point. Define the schema once; generate everything from it; let each consumer pick the wire that suits them.

What not to do: pick gRPC because it is faster on a benchmark, then deploy it without xDS-aware load balancing, without observability tooling that understands gRPC trailers, and without a debugging story for engineers who are used to curl. The protocol is fine; the surrounding ecosystem must come with it. Conversely, do not stay on REST forever just because it is familiar — at sufficient scale the JSON-parse cost and the HTTP/1.1 connection churn become real money, and a Connect-shaped migration that keeps the JSON-over-HTTP option open while adding a gRPC option is usually less work than expected.

One last consideration: schema governance. A gRPC fleet has a single source of truth that every service must agree on; tools like Buf's buf breaking and buf lint enforce backward compatibility at code-review time. A REST fleet usually relies on shared good intentions, OpenAPI documents that drift from reality, and a string of incidents to teach the lesson that breaking changes ship in subtle ways. Whether that strict-or-lax governance is what you want depends entirely on the size of the team and the cost of an outage; both regimes can work, both can fail, and the choice tends to reveal more about the organisation than about the protocol.


Further reading on gRPC vs REST

Primary sources, in order.

Found this useful?