System Design Reference Sheet¶

Load Balancing¶

Strategy	How It Works	Best For
Round Robin	Rotate through servers sequentially	Homogeneous servers, stateless services
Weighted Round Robin	Rotate with weights per server capacity	Heterogeneous servers
Least Connections	Route to server with fewest active connections	Long-lived connections, varying request cost
Consistent Hashing	Hash key to ring position; route to next node	Caches, sharded datastores, minimize reshuffling
IP Hash	Hash client IP to pin to a server	Session affinity without sticky cookies

Consistent hashing detail: virtual nodes (vnodes) spread load more evenly. Adding/removing a node only moves K/N keys on average.

Caching Patterns¶

Pattern	Flow	Tradeoff
Cache-aside (lazy)	App checks cache -> miss -> read DB -> populate cache	Simple; stale reads possible
Write-through	App writes cache + DB together	Strong consistency; higher write latency
Write-behind (write-back)	App writes cache; cache async-writes DB	Low write latency; risk of data loss
Read-through	Cache itself fetches from DB on miss	Cleaner app code; cache becomes a dependency

Key knobs: - TTL: balance freshness vs hit rate. Short TTL = more DB hits. Long TTL = stale data. - Eviction: LRU (most common), LFU (frequency-based), FIFO. - Cache stampede: use locking or probabilistic early expiration. - Invalidation: event-driven (pub/sub on writes) or TTL-based.

Message Queues¶

Kafka¶

Append-only log, partitioned by topic.
Consumer groups for parallel processing; each partition consumed by one consumer per group.
Retention-based (not delete-on-consume); enables replay.
Use case: high-throughput event streaming, audit logs, real-time pipelines.

SQS¶

Fully managed, no partition management.
Standard queue: at-least-once, best-effort ordering.
FIFO queue: exactly-once, strict ordering (lower throughput).
Visibility timeout: message invisible to other consumers while being processed.

Dead Letter Queues (DLQ)¶

Route messages that fail processing N times.
Always set maxReceiveCount. Monitor DLQ depth as an alert.
Reprocess: move DLQ messages back to main queue after fixing the consumer.

Database Design¶

SQL vs NoSQL Tradeoffs¶

	SQL (Postgres, MySQL)	NoSQL (DynamoDB, Mongo, Cassandra)
Schema	Rigid, enforced	Flexible, schema-on-read
Joins	Native	Manual (denormalize or app-side)
Transactions	ACID	Varies (eventual consistency common)
Scale model	Vertical first; read replicas	Horizontal (partition/shard natively)
Best for	Complex queries, relationships	High write throughput, simple access patterns

Sharding Strategies¶

Range-based: shard by key range (e.g., A-M, N-Z). Risk: hot shards.
Hash-based: hash(key) % N. Even distribution; range queries require scatter-gather.
Directory-based: lookup table maps key to shard. Flexible; lookup is a bottleneck.

Replication¶

Leader-follower: one writer, N readers. Simple; replication lag on reads.
Multi-leader: multiple writers. Good for geo-distributed; conflict resolution needed.
Leaderless (Dynamo-style): quorum reads/writes (R + W > N). Tunable consistency.

Distributed Systems Patterns¶

CAP Theorem¶

Pick two of three: Consistency, Availability, Partition tolerance. In practice, partitions happen, so choose CP (reject requests during partition) or AP (serve stale data during partition).

Consensus¶

Raft: leader election + log replication. Understandable. Used in etcd, Consul.
Paxos: more general, harder to implement.
Use case: metadata stores, config management, leader election.

Eventual Consistency¶

Writes propagate asynchronously; reads may see stale data.
Conflict resolution: last-writer-wins (LWW), vector clocks, CRDTs.
Read-your-writes: route reads to the same replica that handled the write.

Other Patterns¶

Circuit breaker: fail fast when downstream is unhealthy. States: closed -> open -> half-open.
Bulkhead: isolate failures by partitioning resources (thread pools, connections).
Saga: distributed transaction as a sequence of local transactions + compensating actions.
Idempotency: design operations so retries are safe. Use idempotency keys.

ASI-Relevant: Real-Time & Geospatial¶

Real-Time Data Ingestion¶

Pattern: Source -> Message broker (Kafka) -> Stream processor -> Store + Serve
Backpressure: if consumer can't keep up, use bounded queues + drop/sample policy.
Exactly-once: Kafka transactions + idempotent producers; or dedup at consumer.
Fan-out: one event triggers multiple consumers (Kafka consumer groups, SNS -> SQS).

Geospatial Indexing¶

Structure	How It Works	Best For
R-tree	Bounding rectangles in a balanced tree	Range queries, nearest neighbor on polygons
Geohash	Encode lat/lng to a string; prefix = coarser area	Proximity search via prefix matching; easy to shard
H3 (Uber)	Hexagonal hierarchical grid over the globe	Uniform area cells; good for aggregation and analysis
Quad-tree	Recursively subdivide 2D space into 4 quadrants	Point data, adaptive resolution
KD-tree	Binary tree splitting on alternating dimensions	Nearest-neighbor in low dimensions

PostGIS: Postgres extension for geospatial. Supports R-tree indexes (GiST), ST_Distance, ST_Within, ST_Intersects.

Stream Processing¶

Windowing: tumbling (fixed, non-overlapping), sliding (overlapping), session (gap-based).
Watermarks: track event-time progress; handle late-arriving data.
Frameworks: Apache Flink (true streaming), Spark Structured Streaming (micro-batch), Kafka Streams (library, no cluster).

API Design¶

REST vs gRPC¶

	REST	gRPC
Format	JSON (text)	Protobuf (binary)
Transport	HTTP/1.1 or 2	HTTP/2 (multiplexed)
Streaming	Limited (SSE, WebSocket)	Native bidirectional streaming
Tooling	Universal (curl, browser)	Needs codegen; harder to debug
Best for	Public APIs, web clients	Internal microservices, low-latency

Pagination¶

Offset-based: ?page=3&size=20. Simple; skips rows on every request (slow at depth).
Cursor-based: ?cursor=abc123&size=20. Encode last-seen ID/timestamp. Stable under inserts.
Keyset: WHERE id > last_id ORDER BY id LIMIT 20. Most efficient for SQL.

Rate Limiting¶

Token bucket: tokens refill at fixed rate; request costs 1 token. Allows bursts.
Sliding window log: store timestamps; count in window. Precise; memory-heavy.
Sliding window counter: combine fixed window counts with interpolation. Good tradeoff.
Implement at API gateway (Kong, AWS API Gateway) or app level (Redis + Lua script).

Monitoring & Observability¶

Three Pillars¶

Pillar	What	Tools
Metrics	Numeric time-series (latency p99, error rate, throughput)	Prometheus, Grafana, CloudWatch
Logs	Structured event records	ELK (Elasticsearch + Logstash + Kibana), Loki
Traces	Request flow across services	Jaeger, Zipkin, AWS X-Ray, OpenTelemetry

Key Metrics (RED Method)¶

Rate: requests per second.
Errors: error count or error rate.
Duration: latency distribution (p50, p95, p99).

Alerting¶

Alert on symptoms (high error rate), not causes (CPU usage).
Use multi-window, multi-burn-rate alerts to avoid flapping.
Runbooks: every alert should link to a doc explaining what to check and how to mitigate.

SLIs/SLOs/SLAs¶

SLI (indicator): measured metric (e.g., "99.2% of requests < 200ms").
SLO (objective): target for the SLI (e.g., "99.9% availability per month").
SLA (agreement): contractual promise with consequences for breach.
Error budget: 100% - SLO = budget for experimentation/deploys.