Monolith limits — System Design Handbook

A monolith is the right answer until it isn't. Knowing where the line is — and why it's there — is more valuable than any architecture diagram.

Before we discuss microservices, Kubernetes, or global CDNs, we must understand the humble starting point: the monolith. Every successful system began as one. Knowing what it does well, where it stops, and the order in which the limits hit you saves you from premature decomposition (which costs you years) and from holding on too long (which costs you reliability). This module walks the monolith from "it just works" through every wall it eventually meets.

Most apps live the curve in green: cache, replicas, queues. The wall — the moment vertical scaling stops working — is where decomposition becomes worth its cost. Not before.

What a monolith is — and what makes it good

A monolith is a single application, deployed as one unit, with one codebase and one database. It's the cliché architecture diagram with one box labelled "the app." This sounds primitive next to microservices, but it has properties that microservices give up at considerable cost.

One codebase, one deploy: "Find usages" works. Refactors are mechanical. Atomic commits change the API and every caller in one diff.
One database, one transaction: ACID across the entire data model. No saga, no two-phase commit, no eventual-consistency reasoning.
In-process function calls: Microsecond latency, no network failures, no serialisation, no protocol versioning.
One observability surface: Logs, metrics, traces all live in one place. A bug is reproducible because there's one binary.
One environment to run: A new engineer clones the repo, runs make dev, and is productive in an hour. Not five clusters and a service mesh.

These are the properties microservices spend years trying to win back. Do not give them up unless the monolith no longer fits.

How far a monolith goes

A surprising answer: very far. Stack Overflow famously ran its entire web tier on nine "primary" web servers (backed by a handful of SQL Server, Redis, and Elasticsearch boxes) for years while serving billions of pageviews a month — and noted it could run on a single web server in a pinch. Shopify, Basecamp, GitHub all ran as monoliths well past the point most teams would have decomposed them. The numbers worth memorising:

Tier	Approximate scale	What works
Single instance	~100 RPS, ~10k users/day	SQLite or single Postgres on the same box. No cache.
Vertical-scaled monolith	~1k RPS, ~1M users/day	One big DB box (db.r6.4xl class), one cache, two app instances.
Replicated monolith	~10k RPS, ~10M users/day	Primary + read replicas, distributed cache, multi-AZ app fleet.
Sharded / extracted services	~100k RPS, ~100M users/day	Now decomposition starts to pay. Hot services come out first.
Polyglot at scale	>1M RPS	Distinct service architecture, distinct stores, distinct teams. Welcome to the org chart problem.

The walls — in the order they hit you

The walls aren't surprising; they hit in roughly the same order for every monolith. Knowing the order tells you which optimisation to reach for first.

Wall 1 · Database CPU

The first thing to saturate. Every request hits the DB; queries get slow as tables grow. Mitigations: indexes, query optimisation, then a cache, then read replicas.

Wall 2 · Database write throughput

Reads scale via replicas; writes don't. When the primary's write IOPS or row-locks become the limit, you've reached vertical's ceiling. Mitigations: queue + async writers, then sharding.

Wall 3 · Deploy contention

50 engineers shipping the same binary means deploy queues, merge conflicts, and "I don't want to deploy on Friday." Mitigations: trunk-based development, feature flags, then extract the high-velocity edges as services.

Wall 4 · Single-points-of-failure

One bad migration takes down the whole product. One memory leak in the order code crashes the search code. Mitigations: bulkhead the hot paths into separate processes (still one repo), then extract them.

Wall 5 · Polyglot needs

The ML team wants Python + GPUs, the realtime team wants Rust + io_uring, the rest of the team is on Node. Mitigations: extract the polyglot pieces as services; keep the core monolith on one stack.

Wall 6 · Org chart

Conway's Law strikes. 200 engineers on one codebase don't ship; 20 squads of 10 do. At this point decomposition is not an architecture choice — it's an organisational one.

The optimisations to exhaust before extracting

Indexes: The cheapest change with the biggest payoff. The right composite index turns a 2-second query into a 5-millisecond one.
Caching: Redis or Memcached in front of read-heavy queries. Cuts DB load 5-10× for typical workloads.
Read replicas: Cheap read scaling. Send reports, dashboards, and any non-time-critical reads to replicas.
Queue + worker: Move slow synchronous work (email, encoding, third-party calls) off the request path. Often a 5× P99 latency improvement.
CDN: Push static assets and cacheable responses to the edge. Cuts both latency and origin load.
Connection pooling: PgBouncer or RDS Proxy. Buys you headroom when the issue is "my Postgres is dropping connections," not "my Postgres is slow."
Bulkheading inside the monolith: Run multiple instances of the same monolith with different roles ("api", "worker", "cron"). One process pool can't crash another. Cheap pre-microservices isolation.

The hard cases

Premature microservices. Splitting a 100k-line app into 30 services before you've hit any of the walls is the most common path to a system nobody enjoys working on. You inherit all of microservices' costs (network, distributed tracing, deployment coordination, schema versioning) and none of its benefits. Default to monolith; split when a wall demands it.

The "modular monolith" trap. Code organised into modules with explicit interfaces is a great practice. But "modular monolith" sometimes becomes "we can extract any module into a service later" — and then nobody does, because the modules secretly share state through the database. Be honest about cross-module DB writes; they're the real boundary.

Decomposition order matters. If you must extract services, extract by stability first: the rarely-changing core stays in the monolith, the high-velocity edges come out. Extracting the parts engineers touch every day means coordinating cross-repo changes daily — exactly the cost microservices were supposed to avoid.

Practical defaults

Start as a monolith. One repo, one deploy, one database. Defend it.
When traffic doubles, optimise SQL. When it 10×s, add a cache. When it 100×s, add replicas. When it 1000×s, talk about extraction.
Extract a service when it has a different scaling profile, a different language need, a different ownership boundary, or a different reliability tier — not before.
The first service you extract should be the most-changed and best-isolated. The last you should ever extract is the data model.
Keep the monolith healthy even after extraction starts. It's the spine for years; treat it that way.

Monolith limits.

What a monolith is — and what makes it good

How far a monolith goes

The walls — in the order they hit you

The optimisations to exhaust before extracting

The hard cases

Practical defaults