A monolith is the right answer until it isn't. Knowing where the line is — and why it's there — is more valuable than any architecture diagram.
Before we discuss microservices, Kubernetes, or global CDNs, we must understand the humble starting point: the monolith. Every successful system began as one. Knowing what it does well, where it stops, and the order in which the limits hit you saves you from premature decomposition (which costs you years) and from holding on too long (which costs you reliability). This module walks the monolith from "it just works" through every wall it eventually meets.
What a monolith is — and what makes it good
A monolith is a single application, deployed as one unit, with one codebase and one database. It's the cliché architecture diagram with one box labelled "the app." This sounds primitive next to microservices, but it has properties that microservices give up at considerable cost.
- One codebase, one deploy
- "Find usages" works. Refactors are mechanical. Atomic commits change the API and every caller in one diff.
- One database, one transaction
- ACID across the entire data model. No saga, no two-phase commit, no eventual-consistency reasoning.
- In-process function calls
- Microsecond latency, no network failures, no serialisation, no protocol versioning.
- One observability surface
- Logs, metrics, traces all live in one place. A bug is reproducible because there's one binary.
- One environment to run
- A new engineer clones the repo, runs
make dev, and is productive in an hour. Not five clusters and a service mesh.
These are the properties microservices spend years trying to win back. Do not give them up unless the monolith no longer fits.
How far a monolith goes
A surprising answer: very far. Stack Overflow famously ran its entire web tier on nine "primary" web servers (backed by a handful of SQL Server, Redis, and Elasticsearch boxes) for years while serving billions of pageviews a month — and noted it could run on a single web server in a pinch. Shopify, Basecamp, GitHub all ran as monoliths well past the point most teams would have decomposed them. The numbers worth memorising:
| Tier | Approximate scale | What works |
|---|---|---|
| Single instance | ~100 RPS, ~10k users/day | SQLite or single Postgres on the same box. No cache. |
| Vertical-scaled monolith | ~1k RPS, ~1M users/day | One big DB box (db.r6.4xl class), one cache, two app instances. |
| Replicated monolith | ~10k RPS, ~10M users/day | Primary + read replicas, distributed cache, multi-AZ app fleet. |
| Sharded / extracted services | ~100k RPS, ~100M users/day | Now decomposition starts to pay. Hot services come out first. |
| Polyglot at scale | >1M RPS | Distinct service architecture, distinct stores, distinct teams. Welcome to the org chart problem. |
The walls — in the order they hit you
The walls aren't surprising; they hit in roughly the same order for every monolith. Knowing the order tells you which optimisation to reach for first.
The first thing to saturate. Every request hits the DB; queries get slow as tables grow. Mitigations: indexes, query optimisation, then a cache, then read replicas.
Reads scale via replicas; writes don't. When the primary's write IOPS or row-locks become the limit, you've reached vertical's ceiling. Mitigations: queue + async writers, then sharding.
50 engineers shipping the same binary means deploy queues, merge conflicts, and "I don't want to deploy on Friday." Mitigations: trunk-based development, feature flags, then extract the high-velocity edges as services.
One bad migration takes down the whole product. One memory leak in the order code crashes the search code. Mitigations: bulkhead the hot paths into separate processes (still one repo), then extract them.
The ML team wants Python + GPUs, the realtime team wants Rust + io_uring, the rest of the team is on Node. Mitigations: extract the polyglot pieces as services; keep the core monolith on one stack.
Conway's Law strikes. 200 engineers on one codebase don't ship; 20 squads of 10 do. At this point decomposition is not an architecture choice — it's an organisational one.
The optimisations to exhaust before extracting
- Indexes
- The cheapest change with the biggest payoff. The right composite index turns a 2-second query into a 5-millisecond one.
- Caching
- Redis or Memcached in front of read-heavy queries. Cuts DB load 5-10× for typical workloads.
- Read replicas
- Cheap read scaling. Send reports, dashboards, and any non-time-critical reads to replicas.
- Queue + worker
- Move slow synchronous work (email, encoding, third-party calls) off the request path. Often a 5× P99 latency improvement.
- CDN
- Push static assets and cacheable responses to the edge. Cuts both latency and origin load.
- Connection pooling
- PgBouncer or RDS Proxy. Buys you headroom when the issue is "my Postgres is dropping connections," not "my Postgres is slow."
- Bulkheading inside the monolith
- Run multiple instances of the same monolith with different roles ("api", "worker", "cron"). One process pool can't crash another. Cheap pre-microservices isolation.
The hard cases
Practical defaults
- Start as a monolith. One repo, one deploy, one database. Defend it.
- When traffic doubles, optimise SQL. When it 10×s, add a cache. When it 100×s, add replicas. When it 1000×s, talk about extraction.
- Extract a service when it has a different scaling profile, a different language need, a different ownership boundary, or a different reliability tier — not before.
- The first service you extract should be the most-changed and best-isolated. The last you should ever extract is the data model.
- Keep the monolith healthy even after extraction starts. It's the spine for years; treat it that way.