gRPC
gRPC takes Protobuf, runs it over HTTP/2, and generates typed client and server stubs in most major languages. It started as Google's internal "Stubby" and was open-sourced in 2015. If you have control of both sides of the wire and you want strict typing, fast serialisation, and built-in streaming, it's a sensible default. There are also workloads where it adds more friction than it removes — those are at the bottom of this page.
Three layers
- IDL. You write a
.protofile describing services and messages. Same syntax as Protobuf — see the Protobuf deep dive. - Codegen.
protocgenerates client and server stubs in Go, Java, Python, Rust, C++, Swift, Kotlin, Ruby, Node, Dart, and more. - Runtime. The wire format is Protobuf-encoded binary frames carried over HTTP/2 streams. Each method call is one stream; messages are length-prefixed frames within that stream.
// users.proto
syntax = "proto3";
package users.v1;
service Users {
rpc Get(GetReq) returns (User); // unary
rpc List(ListReq) returns (stream User); // server streaming
rpc BulkCreate(stream User) returns (BulkResp); // client streaming
rpc Chat(stream Msg) returns (stream Msg); // bidirectional
}
message GetReq { string id = 1; }
message User { string id = 1; string email = 2; } Run protoc --go_out=. --go-grpc_out=. users.proto and you have typed
Go client and server stubs. Run it with --python_out and you have Python
stubs. Same .proto, every language.
Why HTTP/2
HTTP/1.1 is one request per connection (or per slot in a connection pool). HTTP/2 multiplexes many concurrent streams over a single TCP connection, with header compression (HPACK) and binary framing. gRPC needs all three:
- Multiplexing lets a long-lived connection carry hundreds of in-flight calls without head-of-line blocking at the application layer.
- Streams let you send a sequence of messages without re-establishing the connection. The four streaming modes are direct projections of HTTP/2 streams.
- HPACK keeps per-call header overhead near zero. Headers like
:method POSTandcontent-type application/grpccompress to single-byte references after the first call.
HTTP/2 is also the reason gRPC doesn't run on browsers — browsers don't expose raw HTTP/2 streams to JavaScript. gRPC-Web is the workaround: a proxy translates between gRPC-over-HTTP/2 and a special framing over HTTP/1.1 that fetch() can read. It strips client-streaming and bidi modes; only unary and server-streaming survive.
The four streaming modes
| Mode | Client | Server | Use for |
|---|---|---|---|
| Unary | 1 msg | 1 msg | Most calls — request/response. |
| Server stream | 1 msg | N msgs | Pagination, search results, log tailing, AI token streaming. |
| Client stream | N msgs | 1 msg | Bulk uploads, telemetry batches, voice → transcript. |
| Bidi stream | N msgs | N msgs | Chat, real-time games, collaborative editing, transcription. |
Both ends know the mode at codegen time, so the generated stubs expose distinct types.
A unary stub returns (User, error); a server-stream stub returns a
UserStream that you iterate. Bidi stubs let you send and receive on the
same handle concurrently.
Status codes — gRPC's own, not HTTP's
gRPC has its own 17-value status code table. They cover the things RPC actually cares about, in language richer than HTTP:
| Code | Meaning |
|---|---|
OK (0) | Success. |
CANCELLED (1) | Operation was cancelled, typically by the caller. |
INVALID_ARGUMENT (3) | Bad input, regardless of system state. |
DEADLINE_EXCEEDED (4) | Caller's deadline elapsed. |
NOT_FOUND (5) | Resource doesn't exist. |
ALREADY_EXISTS (6) | Resource already exists. |
PERMISSION_DENIED (7) | Authenticated, not authorized. |
RESOURCE_EXHAUSTED (8) | Quota / rate limit hit. |
FAILED_PRECONDITION (9) | System state forbids this operation now. |
ABORTED (10) | Concurrency conflict — retry the whole transaction. |
UNIMPLEMENTED (12) | Method not implemented (yet). |
INTERNAL (13) | Server bug or invariant broken. |
UNAVAILABLE (14) | Service temporarily unreachable. Retryable. |
DATA_LOSS (15) | Unrecoverable data corruption. |
UNAUTHENTICATED (16) | No / invalid credentials. |
Map them to HTTP only at the gateway, never inside. The standard mapping
(NOT_FOUND → 404, PERMISSION_DENIED →
403) is a one-way translation; gRPC clients should consume the gRPC code
directly.
Deadlines, not timeouts
Every gRPC call carries an absolute deadline, not a per-hop timeout. If service A calls B with a 100ms deadline and B calls C, the deadline propagates — C only gets whatever budget is left. This is the single most under-appreciated feature of gRPC, and the reason it works at planet scale.
# pseudocode — what happens on every gRPC call
ctx, cancel := context.WithTimeout(parent, 100ms)
defer cancel()
# the deadline (now + 100ms) goes into the gRPC metadata
# the next service reads it, computes remaining budget,
# and applies the same context to its outbound calls
resp, err := client.Get(ctx, &GetReq{Id: "42"})
# if 60ms elapsed before B calls C, C sees a 40ms deadline
# C cannot extend it. The deadline shrinks monotonically. Without this, a slow downstream call holds the whole upstream chain. With it, every service stops working as soon as the user-facing call has given up. Tail-latency meltdowns become recoverable instead of fatal.
Interceptors — middleware for RPC
Cross-cutting concerns — auth, tracing, retries, metrics, panic recovery — go in interceptors, the gRPC equivalent of HTTP middleware. They wrap the call before/after the user code runs.
func authInterceptor(
ctx context.Context, req any,
info *grpc.UnaryServerInfo, handler grpc.UnaryHandler,
) (any, error) {
md, _ := metadata.FromIncomingContext(ctx)
if !validate(md["authorization"]) {
return nil, status.Error(codes.Unauthenticated, "bad token")
}
return handler(ctx, req)
}
server := grpc.NewServer(
grpc.UnaryInterceptor(authInterceptor),
grpc.StreamInterceptor(streamAuthInterceptor),
) Production stacks chain four to six interceptors: panic recovery on the outside, then tracing, then metrics, then auth, then optional rate-limiting and logging. Each one is ~30 lines of code, runs on every call, and stays out of business logic.
Where gRPC fits well
- Internal microservices. Strict schemas, ~5–10× the throughput of
JSON+REST, typed clients in every language. A single
.protofile replaces ten hand-written client libraries. - Polyglot teams. The Go service, the Python data-science service, and the iOS client all consume the same generated stubs. No translation layer.
- Streaming workloads. Real-time push, log/metric pipelines, IoT telemetry, mobile sync. The four streaming modes cover the design space.
- Service meshes. gRPC plays well with Envoy, Linkerd, Istio. Built- in load balancing, retry budgets, and circuit-breaker integration.
Where it adds friction
- Browsers. No raw HTTP/2 access — you need gRPC-Web, which strips streaming and adds a proxy. Connect Web (Buf) is the modern alternative.
- Public APIs. Customers can't
curlit; tooling is weaker than REST; debugging requires the.protofile. Most public APIs that "support gRPC" actually ship a REST gateway and call gRPC from inside. - Debugging. Wireshark dumps look like binary noise without the
.protofile. Reach forgrpcurlwith Reflection enabled on dev/staging. - Load balancing. Long-lived HTTP/2 connections trip up L4 balancers — once a connection sticks to a backend, all its streams stick too. Either use L7 LBs (Envoy, Linkerd) or do client-side round-robin.
- Schema breakage. Easier to silently break clients than with REST, because field-number reuse mis-decodes silently. Use Buf's breaking-change linter as a CI gate.
gRPC vs REST in practice
A reasonable default: REST/JSON for public-facing edges, gRPC for everything
server-to-server. Many shops front their gRPC services with a thin REST
gateway (grpc-gateway generates one from the same .proto). You get binary
efficiency inside and a familiar HTTP+JSON face outside. The .proto file is the source
of truth either way.
The mistake is treating it as either/or. Most non-trivial systems run both: gRPC for internal calls between services that share a binary fence, REST/JSON for the public edge where curl-ability and tooling matter more than per-call CPU.
Load balancing — the hidden problem
HTTP/2 multiplexes many requests over one TCP connection. That's a feature for latency and a problem for load balancing. An L4 (TCP-level) load balancer like AWS NLB sees one connection from each client and routes it to one backend; once pinned, every subsequent request rides that same connection, regardless of how loaded the chosen backend is. The result: skewed load, "noisy backend" tail latency, and that one pod you mysteriously can't drain.
Three strategies for fixing it, in increasing sophistication:
| Strategy | How it works | Trade-off |
|---|---|---|
| L7 / proxy-side balancing | Run an HTTP/2-aware proxy (Envoy, Linkerd, NGINX with grpc_pass) in front of the backends. The proxy sees individual RPCs and routes each one. | Extra hop adds 0.5-1 ms; the proxy is a new failure domain. |
| Client-side balancing (lookaside) | The client gets the list of backends from a name resolver (DNS, xDS, etcd) and opens a connection to each. Each RPC picks a backend via round-robin or pick-first. | Requires SDK support; client + server fan-out grows N×M; reconnect storms on backend churn. |
| Look-aside with xDS | gRPC clients talk to a control plane (xDS) that pushes endpoint + weight updates. The dataplane stays direct (no proxy in the request path) but routing decisions are centralised. | Most operational complexity; near-zero data-path latency overhead; the model behind Istio + Consul Connect. |
Versioning Protobuf — what you can change, and when
Protobuf wire compatibility is famously good, but only if you follow specific rules. The wire format encodes only field tag numbers, not names; the field name and type at compile time are interpretation, not transport. So you can rename anything; you cannot reuse tag numbers without corrupting historic readers.
| Change | Safe? | Notes |
|---|---|---|
| Add a new optional field with a new tag number | Yes | Old code skips it; new code sees the default. |
| Rename a field (same tag, same type) | Yes | Wire format ignores names. Source code on both sides re-builds. |
Change optional ↔ repeated on the same tag | No | Wire types differ; old readers will see garbage. |
Change int32 ↔ uint32 ↔ int64 | Yes (with caveats) | Same varint wire type. Sign extension on negative ints can surprise. |
Change int32 ↔ fixed32 | No | Different wire types (varint vs 32-bit fixed). |
| Remove a field | Maybe | Old code keeps writing it; new code ignores it. Reserve the tag number so it's never reused. |
| Add a new enum value | Yes for proto3 | Old readers see the numeric value if their generated enum doesn't recognise it. |
| Reorder fields in the .proto file | Yes | Wire format is keyed on tag number, not file order. |
The two patterns that keep services compatible across years: always
reserved 7, 8; when removing fields (the protoc compiler enforces
the reservation across the whole codebase), and never reuse a tag number even
for a "different but semantically equivalent" field — encode it as a new tag.
message User {
string id = 1;
string email = 2;
// reserved field 3 — was "phone_number" before we moved phones to a sub-message
reserved 3;
reserved "phone_number";
PhoneNumber phone = 4;
}Compression, message size, and the network bill
gRPC compresses per message, not per connection. By default the implementations
ship with gzip, with an opt-in for deflate and the
much newer zstd (gRPC-Go 1.55+, Java 1.62+). For binary protobuf
the gains are smaller than text — typical 20-40% compression vs 70%+ on JSON —
but for sustained streaming or for messages with large repeated string fields,
it still earns its keep.
Two production gotchas worth knowing:
- The 4 MB default
max_receive_message_size. Most language implementations cap incoming messages at 4 MB. That's fine for typical RPC, but if you send a large batched response or a binary blob inline, the receiver will reject withRESOURCE_EXHAUSTED. Either bump the cap (grpc.MaxRecvMsgSize) or stream the payload. - HTTP/2 frame size limits. Each HTTP/2 frame is at most 16 KB by default (SETTINGS_MAX_FRAME_SIZE = 16,384), configurable up to 16 MB. Large messages get split into multiple frames within the same stream. This is transparent to the application but affects how head-of-line blocking can manifest within a single RPC.
- Compression and observability. Tracing infrastructure (Jaeger, Honeycomb) often samples request/response bytes. With compression on, sampled bytes are still in the wire format — meaning you can't grep response bodies for debugging. Most teams disable compression on traced calls or sample at the proto-message level.
An RPC on the wire — HTTP/2 frames, in order
A single unary gRPC call decomposes into a small handful of HTTP/2 frames. Once you've seen the sequence, every "why is my gRPC call doing X" question becomes easier to reason about, because you can map the symptom back to a specific frame at a specific point in the exchange.
Client Server
│ │
│ ── HEADERS (stream 1, END_STREAM=0) ────────────────────────────▶│
│ :method = POST │
│ :scheme = https │
│ :path = /catalog.v1.CatalogService/GetItem │
│ :authority = catalog.svc:443 │
│ content-type = application/grpc+proto │
│ te = trailers │
│ grpc-timeout = 3500m │
│ grpc-encoding = gzip │
│ authorization = Bearer eyJhbGciOi... │
│ │
│ ── DATA (stream 1, END_STREAM=1) ───────────────────────────────▶│
│ [1-byte flags | 4-byte length] [protobuf-serialised request]│
│ │
│ │ ◀── HEADERS
│ │ :status = 200
│ │ content-type = application/grpc+proto
│ │
│ │ ◀── DATA
│ │ [protobuf-serialised response]
│ │
│ │ ◀── HEADERS (TRAILERS, END_STREAM=1)
│ │ grpc-status = 0
│ │ grpc-message = (omitted on success)Five details worth internalising. First, the message envelope inside each DATA
frame is compressed-flag (1 byte) + length (4 bytes) + payload. The
length lets a server frame the next message even though TCP-or-HTTP/2 buffering
may have split the payload across multiple frames. Second, grpc-timeout
is the deadline carried on every call — propagated to downstream RPCs by the
interceptor stack. Third, te: trailers is required by spec; some
HTTP middleboxes strip it and break gRPC silently. Fourth, the gRPC status code
rides on the trailers, not the response HEADERS — which is why a 200
HTTP response can still represent a gRPC error, and why intercepting HTTP/2 at
a non-grpc-aware proxy can drop the trailers and corrupt the result. Fifth,
HTTP/2 stream identifiers are per-connection and bumped odd-numbered per new
client request; this is why grpc-go pre-allocates a slab of stream IDs at
connection setup.
UNAVAILABLE with no body, not in success. This is a common source
of "the server logged success but the client saw failure" mysteries. The fix
is usually a longer LB idle timeout or moving to a gRPC-aware proxy that
respects the trailer.Authentication and credentials
gRPC separates channel credentials (the connection-level trust
model, almost always TLS) from call credentials (the per-RPC
authentication tokens, often a JWT or service-account assertion). The two layer:
the channel sets up the secure transport; call credentials ride as metadata on
each RPC. This is more granular than HTTP/REST defaults, which usually conflate
the two into a single Authorization header on every request.
| Credential | Layer | What it provides | Where it's used |
|---|---|---|---|
| TLS (server auth) | Channel | Server identity, encrypted transport. Client verifies cert chain. | Default for all public traffic. |
| mTLS (mutual TLS) | Channel | Both server AND client present certs. Identity flows in both directions. | Service mesh defaults — Istio, Linkerd, Consul Connect. |
| ALTS (Application Layer Transport Security) | Channel | Google's internal binding-protocol — identity via service account, no cert provisioning. | Google production. Available open-source but rare elsewhere. |
| JWT bearer token | Call | Per-RPC identity claim verified at the server. | User-facing APIs, machine-to-machine where mTLS would be heavy. |
| OAuth 2.0 access token | Call | Same shape as JWT but issued by a trusted IdP, often short-lived. | Public APIs, federated identity. |
| Google Compute Engine creds | Call | The instance's metadata-server token, auto-refreshed by the SDK. | GCP services calling other GCP services. |
| AWS SigV4 | Call | Per-request HMAC signature using the IAM role. | AWS-hosted gRPC services (less common — AWS leans REST). |
The recommended production posture is mTLS at the channel layer for service identity, plus a per-call JWT carrying the end-user identity. The mTLS cert says "this is the orders service" and the JWT says "running on behalf of user 42". A server-side interceptor extracts both, populates the request context, and the handler treats them as separate concerns. This split lets a misconfigured user token fail per-call without taking down the connection, and lets a rotating service identity be revoked without invalidating every in-flight user session.
// Go — composing channel + call credentials
creds := credentials.NewTLS(&tls.Config{Certificates: []tls.Certificate{cert}})
perRPC := oauth.NewOauthAccess(&oauth2.Token{AccessToken: token})
conn, err := grpc.Dial(addr,
grpc.WithTransportCredentials(creds),
grpc.WithPerRPCCredentials(perRPC),
)Retries and hedging — the service-config knob
gRPC has a built-in retry mechanism configured via the service config —
a JSON document the client receives at name-resolution time (or hand-rolled into
the client). Two strategies are first-class: retry (try the
next backend on a failed call) and hedging (fire the call to
multiple backends in parallel, return the first success). Most teams reach for
retry naturally; hedging is the under-used one that wins tail latency for
read-heavy workloads.
{
"methodConfig": [{
"name": [{ "service": "catalog.v1.CatalogService", "method": "GetItem" }],
"retryPolicy": {
"maxAttempts": 4,
"initialBackoff": "0.1s",
"maxBackoff": "1s",
"backoffMultiplier": 2,
"retryableStatusCodes": ["UNAVAILABLE", "DEADLINE_EXCEEDED"]
}
}, {
"name": [{ "service": "catalog.v1.CatalogService", "method": "Search" }],
"hedgingPolicy": {
"maxAttempts": 3,
"hedgingDelay": "0.05s",
"nonFatalStatusCodes": ["UNAVAILABLE"]
}
}]
}The retry budget is enforced by a token-bucket on the channel to prevent retry
storms; once exhausted, calls fail through without retrying even if their status
is in the retryable list. Default budget is 10 tokens replenished
at 0.1 per RPC — generous enough for healthy systems, tight enough
that a flapping backend can't snowball.
Two rules to follow: only retry calls that the service config has marked
idempotent, and never retry calls whose status code might mean
the server did the work but the network swallowed the response (anything other
than UNAVAILABLE and RESOURCE_EXHAUSTED with the right
details should be considered risky). The gRPC client library enforces this by
refusing to retry a call once any byte of the request has been written to the
wire — unless you opt in with the commitable_retry flag.
hedgingDelay: 50ms and maxAttempts: 2, the slow
tail gets a second chance — the first call that returns wins. p99 typically
drops by 5-10x. The cost is a few percent extra QPS to the backend pool. This
is the canonical "tail-tolerant" pattern from Google's The Tail at Scale
paper.Connection lifecycle — keepalive and idle timeout
HTTP/2 connections are long-lived. That's by design — connection setup (TCP + TLS) is expensive, and reusing one connection for thousands of RPCs is the whole point. But a long-lived connection that nobody knows is dead — a NAT that silently dropped state, a load balancer that severed it without sending RST — is a slow source of "the first call after lunch always times out" production pain.
gRPC exposes three knobs for managing this:
| Setting | Default | What it does |
|---|---|---|
KEEPALIVE_TIME_MS | infinite (off) | Send a PING frame every N ms when the connection is idle (no streams). If no ACK in KEEPALIVE_TIMEOUT_MS, tear it down. |
KEEPALIVE_TIMEOUT_MS | 20s | How long to wait for the PING ACK before assuming the peer is gone. |
KEEPALIVE_PERMIT_WITHOUT_CALLS | false | Allow pings even when no RPCs are active. Required for long-idle clients. |
MAX_CONNECTION_AGE_MS | infinite (off) | Server-side. Gracefully drain and close the connection after this age — forces clients to re-resolve DNS and rebalance. |
MAX_CONNECTION_IDLE_MS | infinite (off) | Server-side. Close a connection that's been idle this long. Frees resources on the server. |
MAX_CONCURRENT_STREAMS | 100 | HTTP/2 setting. Server tells client how many concurrent streams it accepts. Saturating this leads to client-side queuing. |
A sensible production posture: clients set
KEEPALIVE_TIME_MS to 30-60 seconds and
KEEPALIVE_PERMIT_WITHOUT_CALLS true. Servers set
MAX_CONNECTION_AGE_MS to ~10 minutes (with a 30s grace) so
autoscaling churn actually moves traffic; idle timeouts to something generous
like 1 hour. Without MAX_CONNECTION_AGE_MS, scaling-out adds
capacity but existing clients pin to the original pods forever — a common
source of "I added pods but load didn't budge" surprise.
GOAWAY: too many pings and tear down
the connection. The server's PING_STRIKES threshold (default 2)
has to be coordinated with the client's keepalive interval. Either set both
explicitly or accept the defaults on both ends.Observability — tracing, channelz, stats handlers
Three layers of visibility matter, and gRPC supports all of them as first-class extension points rather than bolt-ons.
- Distributed tracing via OpenTelemetry. The
otelgrpcinterceptor (every major language has an equivalent) extracts the W3C traceparent header from incoming metadata, starts a child span, and injects the propagated context into outgoing calls. The result is one trace per cross-service RPC, with the gRPC method as the span name. This is the same shape as HTTP-tracing instrumentation, just plugged in via gRPC's interceptor seam. - Channelz. A built-in introspection service every gRPC server can expose. It enumerates live channels, sub-channels, sockets, and per-stream statistics — what you'd otherwise have to scrape from kernel netstat plus application logs. Useful for debugging which backends a load- balanced channel is actually using and how many streams are in flight per connection.
- Stats handlers. A lower-level hook than interceptors. You see every begin/end of every RPC, every byte read and written, every header and trailer. Most service-mesh dataplanes plumb these into Prometheus counters via the language SDK; it's also the seam Stripe and Slack use to produce per-method latency histograms by status code.
Two metrics every gRPC service should expose: RPCs per second by status
code (so you can alert on a sudden surge of UNAVAILABLE
or DEADLINE_EXCEEDED) and p50/p99/p99.9 latency by
method (so you can localize a regression to a single endpoint). Both
fall out of the stats-handler hook with maybe 20 lines of code per language.
RESOURCE_EXHAUSTED means the server is rate-limiting or
the queue is full — page someone. Sustained >1% returning
UNAVAILABLE means the client can't reach the server at all — page
someone. Both are usually invisible at the HTTP layer because gRPC errors
don't surface as HTTP non-2xx; you have to instrument the gRPC stats layer
specifically.Code generation and the tooling stack
Protobuf and gRPC use a code-generation model: the .proto files describe the service contract, a code generator emits language-specific stubs and message classes, and the application code consumes those generated types. The generation step is the part most teams initially under-invest in and then spend disproportionate time fixing.
| Tool | Role | Notes |
|---|---|---|
protoc | The reference code generator | Written in C++, ships separate plugins per language. The OG tool. Almost never used directly anymore — too low-level. |
| buf | Higher-level workflow tool | Wraps protoc with lint rules, breaking-change detection, BSR (Buf Schema Registry) for cross-team .proto distribution. The modern default. |
protoc-gen-go / protoc-gen-go-grpc | Go code generators | The split between message generation and service generation is recent (~2020). Older codebases ship one combined plugin. |
protoc-gen-grpc-java | Java generator | Produces enormous classes. Often paired with protoc-gen-grpc-kotlin for Kotlin idioms. |
| Tonic / Prost | Rust generator + runtime | The dominant Rust gRPC stack. Generates async-await idiomatic code; integrates with Tokio. |
| betterproto | Modern Python generator | Produces dataclasses, type hints, async-native. Replaces the old googleapis-common-protos style. |
| grpc-gateway | REST gateway from .proto | Reads HTTP annotations in the .proto, generates a reverse-proxy that turns REST calls into gRPC. The "single source of truth" pattern. |
| protoc-gen-validate (PGV) | Field-level validation | Reads validate.rules annotations, generates language-specific validators. Saves writing validation boilerplate. |
| connect-go / connect-web | Alternative gRPC + REST framework | Buf's stack. Same .proto, single binary speaks gRPC, gRPC-Web, and JSON over HTTP/1.1. Excellent for browser clients. |
A reasonable starting point in 2026: use buf for builds and lint,
buf breaking in CI to catch wire-incompat changes before they
merge, and language-specific generators behind it. If a public REST surface is
required, layer either grpc-gateway or Connect on top. Resist the temptation to
hand-write the gRPC client and server — the generator output is mechanical and
the value is in keeping the .proto as the only thing humans edit.
buf breaking --against against
the previous release in CI. It catches removed fields, changed types, reused
tag numbers — all the things that look fine in code review but break clients
already in the field. Most outages in the gRPC ecosystem start with a wire-
incompat change that the team didn't realise was wire-incompat.The Protobuf wire format, in 200 words
Protobuf encodes each field as (tag, wire-type, payload). The tag-and-wire-type combine into a single varint key; the wire type tells the parser how to read the payload that follows. Six wire types cover everything.
| Wire type | ID | Used for |
|---|---|---|
| VARINT | 0 | int32, int64, uint32, uint64, bool, enum — 1-10 bytes, MSB continuation bit |
| I64 | 1 | fixed64, sfixed64, double — exactly 8 bytes little-endian |
| LEN | 2 | string, bytes, embedded messages, packed repeated — varint length followed by payload |
| SGROUP / EGROUP | 3, 4 | Deprecated proto2 groups. Don't use. |
| I32 | 5 | fixed32, sfixed32, float — exactly 4 bytes little-endian |
Two implications worth knowing. First, signed ints (int32,
int64) encode negative numbers as 10-byte varints — twos-complement
is sign-extended to 64 bits first. If your field is mostly negative, use
sint32 (which zigzags before varint-encoding) to keep the wire
size small. Second, the VARINT continuation bit means parsing has to walk the
payload byte by byte; on hot paths, fixed-width fixed64 can be
faster despite using more bytes, because the decoder can do a single 8-byte
load.
Backpressure and flow control in streams
Streaming RPCs introduce a question unary calls never ask: what happens when
the producer is faster than the consumer? gRPC inherits HTTP/2's per-stream
flow control window — every receiver advertises a window size that the sender
may not exceed without an explicit WINDOW_UPDATE frame. That gives
you wire-level backpressure for free. The trap is that most language SDKs
expose a higher-level Send/Recv API that hides the wire, and so an application
can happily call stream.Send(...) a million times in a tight loop
and discover only later that the data is buffered in the SDK rather than
actually flowing.
| Layer | Mechanism | What you have to do |
|---|---|---|
| HTTP/2 flow control | Per-stream WINDOW_UPDATE frames; default window 64 KB | Nothing — handled by the SDK. Bump INITIAL_WINDOW_SIZE on high-throughput streams to amortise window-update cost. |
| SDK send buffer | Per-language buffer between the application Send call and the actual write | Check whether your SDK blocks on backpressure or buffers unboundedly. grpc-go blocks; some others queue. |
| Application-level pacing | Producer paces itself based on observed throughput | For unbounded streams (CDC, telemetry), add an explicit acknowledgement message every N records and pause when behind. |
| Receive buffer | The consumer's stream.Recv() rate sets the WINDOW_UPDATE cadence | If the consumer is slow, the window stalls, the sender blocks. This is the wanted behaviour — don't try to "fix" it by buffering on the server. |
For server-streaming RPCs (one-to-many), the typical anti-pattern is a server
that produces 100,000 records into a slow client and OOMs on the buffer. The
fix is to write the producer as a loop that blocks on each Send —
grpc-go's SendMsg blocks until the stream has window — and stop
producing eagerly. For bidirectional streams, application-level
acknowledgement messages (a "credits" pattern, à la TCP windows) give you
explicit control: the consumer grants N credits, the producer sends N messages,
waits for more credits, repeats. This is how Apache Pulsar and CockroachDB's
KV stream work.
64 KB / RTT throughput —
roughly 6 MB/s over a 10 ms RTT. Bump
InitialWindowSize (and InitialConnWindowSize) to
something like 4 MB if you're streaming GBs over a WAN. It's two lines of SDK
config and turns 6 MB/s into the actual line rate.Local development and testing
gRPC pays back its setup cost most when integrated into a fast feedback loop — but it has a longer ramp than REST because there is no curl-equivalent shipped with the OS. Three tools fill the gap.
| Tool | What it does | When to reach for it |
|---|---|---|
grpcurl | CLI client that uses reflection or a .proto file to make gRPC calls. The curl-equivalent. | Quick "does this RPC return what I expect" checks. Smoke tests in CI. |
evans | Interactive REPL for gRPC services. Tab completion across methods and fields. | Exploring an unfamiliar service. Onboarding to a new codebase. |
| Postman / Insomnia | GUI clients with gRPC support since ~2022. | Cross-team API workshops; sharing collections with non-engineers. |
buf curl | Buf's grpcurl-like with HTTP/JSON + gRPC + Connect support. | If your stack already uses Buf for builds. |
| gRPC reflection | Server-side feature that exposes the .proto schema over a special RPC. | Enable in dev/staging; disable or auth-gate in production (it's an attack-surface signal). |
For tests, prefer the in-process channel over a real TCP loopback. Every SDK
supports something like grpc-go's bufconn — a fake net.Listener
backed by a buffer — so unit tests can spin up a real server, dial it, and
exercise interceptors without binding a port. This makes tests parallelisable,
deterministic, and fast (sub-millisecond per call).
// Go — in-process gRPC server for tests
lis := bufconn.Listen(1024 * 1024)
srv := grpc.NewServer()
catalogpb.RegisterCatalogServiceServer(srv, &fakeCatalog{})
go srv.Serve(lis)
conn, _ := grpc.DialContext(ctx, "bufnet",
grpc.WithContextDialer(func(context.Context, string) (net.Conn, error) {
return lis.Dial()
}),
grpc.WithTransportCredentials(insecure.NewCredentials()),
)
client := catalogpb.NewCatalogServiceClient(conn)Two operational rules pay back disproportionately. First, every new service
should ship with a grpcurl-callable health-check method
(grpc.health.v1.Health/Check) — the load balancer and the
deployment pipeline both want this. Second, write at least one end-to-end
integration test that exercises the full client SDK, not just an in-process
handler — there have been many production incidents caused by an interceptor
that only fires on the real network path (auth, tracing, retries) misbehaving
in ways the in-process test never saw.
When NOT to use gRPC
A short, opinionated list. gRPC is excellent for its core use case and routinely misapplied outside it.
- Public-facing APIs consumed by third parties. Discoverability is bad, browser support is via grpc-web (which loses bidirectional streaming and requires a proxy), and curl-ability is essentially nil. Use REST/JSON or GraphQL at the edge.
- Browser-to-server with significant client state. grpc-web works but the SDK is awkward and the streaming story is half-baked. Server-sent events or WebSockets are usually a better fit.
- Mostly-static configuration / reference data. If 95% of your requests are GETs of slow-moving data, the HTTP cache machinery (Cache-Control, ETag, CDN) is too valuable to give up. REST + a real CDN is faster end-to-end than gRPC bypassing the cache.
- When the team can't operate it. Debugging requires Envoy-aware tooling, observability needs proto-aware sampling, and language SDKs vary in maturity. If the org isn't ready for that surface area, JSON-over-HTTP is the lower total cost of ownership.
- When the message domain is text-heavy. Protobuf isn't a great fit for messages dominated by free-form prose (chat content, search documents). The wire savings are smaller and you lose the JSON ergonomics that text-heavy clients want.
Further reading
- gRPC Core ConceptsOfficial docs. Concepts, streaming modes, deadlines, interceptors. Read end to end.
- RFC 7540 — HTTP/2The substrate. Binary framing, multiplexed streams, HPACK header compression.
- Buf — breaking-change linterCI gate for gRPC schema evolution. Catches the mistakes Protobuf can't catch at build time.
- grpcurl + Reflectioncurl for gRPC. With Reflection enabled, introspects the schema live. Indispensable for staging.
- Connect RPCBuf's modern alternative. gRPC-compatible on the wire, browser-native on the client. Worth comparing before you commit.