Event ingestion
The canonical "ad-click counter" question, asked at billing-grade correctness. Every event has to land, every event has to be counted once, and the customer-facing dashboard has to be fresh enough to be useful. The shape of the answer is familiar — a Kafka tier, idempotency keys, a columnar store for rollups, a fraud filter — but the details are where the design earns its keep. Most candidates get the pipeline right and the accounting wrong.
1 · Clarifying questions
Five minutes. The correctness bar is the question that changes the architecture more than any other — pin it down first.
| Events per second at peak? | Assume 1M events/sec sustained, 5× burst at peak hours. That puts the ingest tier in "Kafka, not RabbitMQ" territory and forces sharded consumers. |
| What's the correctness bar? | Three tiers: estimate (analytics dashboards), audited (internal reporting), billing-grade (advertisers get invoiced from these numbers). Assume billing-grade — it dominates the design. |
| Fraud filtering required? | Yes. Click fraud is ~10–20% of raw traffic in ads; uncaught fraud directly inflates customer bills. Filter at ingest, refine in batch. |
| Freshness budget for the customer dashboard? | ~60 seconds for real-time numbers; final billing numbers can be hours late. Two tiers of freshness with different correctness bars. |
| Historical query horizon? | 90 days hot (sub-second queries), 5 years warm (minute-class queries), beyond that archived. Tiering drives most of the storage cost. |
| Customer query shape? | Mostly rollups by hour or day, scoped to a campaign. Top-K queries (top spending advertisers, top-converting creatives). Ad-hoc drill-down is rare but must work. |
| Schema stability? | Events evolve — new fraud signals, new device fields. Schema registry with backward-compatible evolution required. |
| Multi-tenant isolation? | Yes, soft. One advertiser's runaway campaign should not block another's ingest. Per-tenant quotas at the edge. |
2 · Capacity math, on a napkin
Storage is the dominant cost line; the math here decides whether the design is affordable. Run it before drawing boxes.
| Number | Calculation | Result |
|---|---|---|
| Events / sec (sustained) | given | 1M |
| Events / sec (peak) | 5× sustained | 5M |
| Bytes / event (avg) | raw JSON, post-validate | ~200 B |
| Ingest bandwidth | 1M × 200 B | ~200 MB/sec |
| Raw / day | 200 MB × 86,400 | ~17 TB |
| Raw / year | ×365 | ~6 PB |
| Columnar compressed | ~10× (typical ClickHouse) | ~600 TB / year |
| Kafka retention (7 days) | 17 TB × 7 × 3 (RF) | ~360 TB |
| Hot rollup cache (Redis) | ~1M campaigns × 24 hourly bins × 64 B | ~1.5 GB / day; ~50 GB at 30 days |
| Consumer parallelism | 200 MB/sec / 10 MB/sec per partition | ~64–128 partitions per topic |
6 PB/year of raw bytes is what shapes the architecture. Without 10× columnar compression the design isn't affordable. Without tiering (hot → warm → cold) the design isn't affordable at 5 years.
3 · API and data model
Batch ingest at the edge. Three storage shapes, each serving a different read pattern.
Endpoints
POST /v1/events # batch ingest from advertiser SDKs / pixel
{"events":[
{"event_id":"01HXY...", "advertiser_id":"A123", "campaign_id":"C9",
"click_ts":1715990400123, "user_hash":"sha256:...",
"ip":"203.0.113.7", "ua":"Mozilla/...", "referrer":"...",
"fraud_signals":{"bot_score":0.04, "vpn":false}}
]}
→ 202 {"accepted":N, "rejected":[{"event_id":"...", "reason":"schema"}]}
GET /v1/reports/clicks # customer-facing rollups
?campaign_id=C9&from=2026-05-17T00&to=2026-05-18T00&granularity=hour
→ 200 {"buckets":[{"ts":"2026-05-17T00", "clicks":12044, "billable":11890}, ...]}
GET /v1/reports/top # top-K within window
?advertiser_id=A123&metric=spend&k=10&window=24h
→ 200 {"top":[{"campaign_id":"C9", "spend":1240.55}, ...]}Schemas
-- raw events (immutable, append-only, columnar)
events_raw
event_id ULID PRIMARY KEY -- client-supplied; idempotency key
advertiser_id VARCHAR
campaign_id VARCHAR
click_ts TIMESTAMP(3) -- ms precision
ingest_ts TIMESTAMP(3) -- server time, for watermarking
user_hash CHAR(64)
ip INET
ua TEXT
referrer TEXT
fraud_signals JSONB -- bot_score, vpn, ip_reputation, ...
partition_key UINT32 -- hash(campaign_id) % 128
PARTITION BY toDate(click_ts), partition_key -- ClickHouse-style
-- aggregated rollups (the customer query target)
clicks_by_hour
campaign_id VARCHAR
hour TIMESTAMP -- truncated to hour, in UTC
clicks_total UINT64 -- including suspected fraud
clicks_billable UINT64 -- post fraud filter
spend_micros UINT64 -- billable × CPC, in micro-units
PRIMARY KEY (campaign_id, hour)
-- hot cache (Redis), TTL ~ 1 hour
rollup:<campaign_id>:<hour> → {clicks_total, clicks_billable, spend_micros}The event_id is the idempotency key — client-generated, ULID for time-ordering. Consumers dedupe on it. The partition_key spreads a single hot campaign across multiple Kafka partitions; without it, one viral campaign starves every other consumer on that partition.
4 · High-level architecture
Edge in front, Kafka in the middle, two-tier storage at the back. The real-time path gets the customer their numbers within a minute; the batch path corrects them.
Edge validates and timestamps. Kafka is the durable spine — every event lands here before any storage. The realtime consumer maintains Redis rollups for the dashboard; the batch job is the source of truth for billing. Fraud signals flow in asynchronously and update the batch rollups.
5 · Deep dive — exactly-once accounting
This is the part most candidates fumble. "Exactly-once" as a delivery guarantee is famously contentious; the production pattern is more honest about what's actually achievable.
Kafka gives at-least-once delivery by default — a consumer can crash after processing
a message but before committing its offset, and the message will be redelivered.
The accepted answer is at-least-once delivery plus idempotency at the
consumer. The client generates an event_id (a ULID); the
consumer's write is keyed on it; a duplicate write is a no-op. The result is
exactly-once effects, regardless of how many times the message is
delivered.
For billing-grade correctness, idempotency at one tier isn't enough. The pattern is two-tier reconciliation: the real-time path is approximate and optimised for freshness; the batch path is authoritative and optimised for correctness. The customer dashboard shows the real-time number, usually within a few percent of the truth. A scheduled batch job (hourly or daily) recomputes the rollups from the raw event log and overwrites the rollup table. Customer-visible numbers shift slightly when the batch runs; that shift is what gets invoiced.
Concretely: the consumer does a conditional write into ClickHouse keyed on
event_id (or maintains a Bloom filter + lookup table of recently-seen
IDs). The batch job runs SELECT COUNT(DISTINCT event_id) ... GROUP BY
campaign_id, hour from the raw table and overwrites clicks_by_hour.
The dashboard reads from clicks_by_hour with a fallback to Redis for
the current hour.
6 · Failure modes & runbook
Each of these has bitten a real ad network. The fixes are not exotic; the interesting part is recognising the failure before it hits production.
| Failure | Symptom | Mitigation |
|---|---|---|
| Duplicate events from client retries | Inflated click counts; advertiser over-billed | Client-generated event_id (ULID); consumer dedupes on it. Keep a rolling 7-day dedupe window in the consumer. |
| Out-of-order events (late arrivals) | Real-time rollup undercounts; batch path corrects | Windowed processing with a watermark — e.g. close the 14:00 hour window at 14:15 in real-time, but the batch job re-reads everything within 24 hours. |
| Kafka partition skew | One campaign with 1000× more clicks saturates its partition; everyone behind it lags | Partition by hash(campaign_id, sub_key) with a sub-key that spreads hot campaigns across 8–16 partitions. Or use a separate "hot tenants" topic. |
| Click fraud (bots, click farms) | Inflated billable counts; advertiser disputes invoice | Rate-limit per IP and per user_hash at the edge. Anomaly detector flags suspicious patterns. Mark in fraud_signals; exclude from clicks_billable. |
| Real-time / batch discrepancy | Dashboard number jumps when batch lands | Surface both numbers in the UI ("real-time: 12,044 · billable: 11,890 as of 14:00"). Cap the allowed discrepancy and alert oncall if > 5%. |
| ClickHouse node failure | One shard slow; query API P99 rises | Replication factor 2+ per shard; query router fails over. Hedge queries to replicas above 200 ms. |
| Schema-incompatible event | Consumer crashes on parse | Schema registry enforces backward compatibility on producer registration. Unparseable events go to a dead-letter topic, never block the live consumer. |
| S3 cold-tier query during business hours | Customer dashboard slow on 1-year drill-down | Pre-aggregate to day granularity for warm tier; sub-day queries against cold tier are async and async-emailed. |
7 · Cost & operability
Storage dominates. The ingest tier is CPU- and network-bound but cheap relative to the petabytes of retained data. A ballpark in 2026 cloud prices:
| Line | Estimate | Note |
|---|---|---|
| Ingest service (100 pods) | ~$5K / month | 4 vCPU × 8 GB; managed K8s |
| Kafka cluster (12 brokers, 7d retention) | ~$25K / month | ~360 TB on NVMe; cross-AZ replication |
| Realtime consumers + fraud | ~$8K / month | ~50 pods, 8 vCPU each |
| ClickHouse (90d hot, ~150 TB compressed) | ~$30K / month | 12-node cluster, NVMe-backed |
| S3 (warm, ~3 PB) | ~$60K / month | Standard tier; drops to ~$15K on Infrequent Access |
| S3 Glacier (cold, 5y archive) | ~$5K / month | Retrieval is async |
| Redis (hot rollups, 50 GB) | ~$2K / month | 3-replica cluster |
| Total | ~$135K / month | Dominated by warm storage; aggressive tiering cuts 30–50% |
Tiering policy
- 7 days hot. ClickHouse, sub-second queries, full row detail.
- 90 days warm. ClickHouse + Parquet on S3 Standard, queries via the same engine, second-class latency.
- 5 years cold. Parquet on S3 Glacier or equivalent. Pre-aggregated to day granularity. Retrieval is async; customer-facing queries beyond 90 days return "ready in N minutes" and email a link.
8 · Trade-offs & what's next
The last five minutes of the interview. The questions below are the ones a strong interviewer reaches for once the core design holds together.
| Question | How to answer |
|---|---|
| Kappa vs Lambda? | Lambda (separate real-time + batch paths) is what was described; Kappa (one stream-processing job that does both) is tidier but harder to operate. The honest answer in 2026: Kappa with Flink works for greenfield; Lambda is what most production systems still run because the batch path is easier to reason about for billing. |
| Schema evolution? | Avro or Protobuf with a schema registry (Confluent or self-hosted). Enforce backward compatibility on producer registration. Forward compatibility for consumers means new fields are optional and old consumers ignore them. |
| Real-time fraud vs batch fraud? | Real-time catches the obvious cases (rate-limits, known bad IPs) and protects ingest. Batch is where the heavy ML scores run — they're too expensive per-event but cheap in aggregate. Real-time prevents 80% of fraud spend; batch reclaims the rest in the daily reconciliation. |
| Per-tenant isolation? | Soft isolation via per-tenant rate limits at the edge and consumer-side quotas. Hard isolation (separate Kafka topics or clusters per large tenant) is reserved for the few customers whose volume justifies the operational overhead. |
| GDPR right-to-delete in a columnar store? | The hard part. Columnar stores are append-optimised; per-row delete is expensive. Standard pattern: tombstone the user_hash in a separate deletion log, filter at query time, and compact (rewrite the affected partitions) on a scheduled job. Some teams use partition-level deletes for users whose data fits a single partition. |
| What would you change at 10×? | Move the ingest tier to a sharded write-ahead log (e.g. Pulsar with tiered storage) so retention doesn't compete with throughput for disk. Push fraud scoring into the ingest pod so the consumer doesn't have to. Pre-aggregate at the producer for the highest-volume tenants. |
Further reading
- Jay Kreps — "Questioning the Lambda Architecture" (2014). The argument for Kappa, written by the author of Kafka. Worth reading before defending either choice in an interview.
- Confluent — "Exactly-Once Semantics in Apache Kafka". The most precise public account of what Kafka's "exactly-once" actually means and where the boundaries are.
- Google — "MillWheel: Fault-Tolerant Stream Processing at Internet Scale" (2013). Where the watermark idea was first written down clearly.
- ClickHouse — "Sparse Primary Indexes" docs. Why columnar stores can answer aggregation queries at the speeds they do.
- Stripe — "Idempotency Keys at Stripe". The pattern for client-generated idempotency keys; applies cleanly to event ingest.
- Adjacent: How Kafka works. The mechanics under the ingest tier.
- Adjacent: Consistent hashing. Background for the partition-key choice.
- Adjacent: Notifications playbook. Same Kafka-tier shape, different correctness bar.