Forward

Hiring Company Perspective

System design interviews have been widely adopted because the communication nad problem solving skills tested in these interviews are similar to those required by a software engineer's daily work.

The interviewee is evaluated based on how they analyze a vague problem and solve the idea step by step.

Additionally, the abilities tested also involve how they explain the idea, discuss with others, and evaluate and optimize the system.

System Design Questions

The questions are usually open ended, just like in the real world, and the desired outcome is to come up with an architecture to achieve the intended design goals.

Interviewers may choose high level architecture to cover all aspects or pick one or more areas to focus on.

The intended design goals are the requirements, constraints, and bottlenecks that shape the direction of both the interviewer and interviewee.

Scope Of This Book

The objective is to provide a reliable strategy to approach the system design questions and to solidify the strategy and knowledge needed for a successful interview.

Chapter 1: Scale From Zero to Millions of Users

Intro

Basic techniques and knowledge for system design

Single Server Setup: Starting Simple

Intro

Lets start simple and run everything on a single server. Here, the application code, database, cache, etc. are all running on a single server.

Single Server: Processing Steps

Client -> DNS: Domain Name

Client: A client is any software or device that initiates requests to a server or service over a network. Examples include web browsers, mobile apps, desktop programs, or other services consuming an API. Clients present data to users or other systems and rely on servers to process requests and return results.

DNS: Users access websites through domain names, (e.g., www.google.com) The Domain Name System (DNS) resolve those domain names into IP addresses, (e.g., 140.250.190.78). DNS is usually provided by third parties (registrars, hosting providers, dedicated DNS services) rather than hosted on the application's own servers.

Domain Name: A domain name is the human-readable address of a resource on the internet (e.g., www.example.com). It acts as an alias for an IP address, making it easier for users to find, remember, and access services without remembering long IP addresses. Domain names are registered through registrars and resolved to IP addresses via DNS.

DNS -> Client: IP Address

IP Address: An Internet Protocol (IP) address is the numerical label assigned to a device on a network. It is returned to the browser or mobile app after a DNS lookup so the client knows where to send requests.

Client -> Web Server: HTTP Request + JSON Body

Once the client receives the IP address, Hypertext Transfer Protocol (HTTP) with JSON are sent to the server.

HTTP Protocol: Hypertext Transfer Protocol (HTTP) is an application level protocol used for communication between clients (e.g., browsers, mobile apps) and servers on the web. It defines how requests and responses are formatted and transmitted over TCP/IP networks.

HTTP itself does not define the format of the data being sent, it only provides a way to transport it (e.g. of data being sent: HTML, JSON, img, binary files).

    // Request Link -> Method Path HTTP/Version 
    GET /index.html HTTP/1.1
    // Headers -> key value pairs describing the request
    Host: www.example.com
    Accept: text/html

HTTP Request/Response: An HTTP request is a message a client sends to a server, asking for a resource or performing an action (e.g., GET /index.html).

An HTTP response is the server's reply, which includes a status code (e.g., 200 OK), headers, and an optional body.

The body of an HTTP message can contain many kinds of data, HTML for web pages, JSON for APIs, or other formats depending on the endpoint.

    // HTTP Request
    GET /about HTTP/1.1
    Host: www.example.com
    Accept: text/html

    // HTTP Request with JSON Body
    POST /api/users HTTP/1.1
    Host: api.example.com
    Content-Type: application/json
    Authorization: Bearer <token>

    {
        "username": "alice",
        "email": "alice@example.com",
        "password": "securePassword123"
    }

    // HTTP Response with HTML Body
    HTTP/1.1 200 OK
    Content-Type: text/html; charset=UTF-8

    <!DOCTYPE html>
    <html>
    <body>
        <h1>Hello World!</h1>
    </body>
    </html>

    // HTTP Response with JSON Body
    HTTP/1.1 200 OK
    Content-Type: application/json

    {
        "id" : 12,
        "firstName": "John",
        "lastName": "Smith",
        "address": {
            "streetAddress": "21 2nd Street",
            "city" : "New York",
            "state": "NY",
            "postalCode": 10021
        },
        "phoneNumber" : "212 555-1234"
    }

Web Server -> Client: HTML/JSON Response

HTML Response: An HTML response is an HTTP response with a HTML body. Its used by web servers to deliver full web pages that browsers can render for users.

    // HTTP Response with HTML Body
    HTTP/1.1 200 OK
    Content-Type: text/html; charset=UTF-8

    // HTML Body
    <!DOCTYPE html>
    <html>
    <head>
        <title>Welcome</title>
    </head>
    <body>
        <h1>Hello, Jane!</h1>
    </body>
    </html>

Diagram: Single Server Setup

                                              1) ------>
               +-------------------------+   api.mysite.com     +--------+
               | Web Browser, Mobile App |                      |  DNS   |     
               +-------------------------+                      +--------+    
                        |        ^            2) <------    
                        |        |           15.125.23.214     
                        |        |       
    3)                  |        |  4)    
    15.125.23.214       |        |  HTML Page 
                        |        |    
                        |        |       
                        |        |      
                        V        |
                    +------------------+
                    |   Web Server     |
                    +------------------+

Traffic Sources: Clients/Users

Client create traffic via web or mobile applications.

Web Application: A web application is client software that runs in a browser. It uses client side languages for the UI (e.g., HTML, CSS, JavaScript) and server side languages (e.g., Java, Python, Node.js) to handle logic, data storage, and processing. Web applications are accessible through URLs and do not require installation on a user's device.

Mobile Application: A mobile application is client software designed to run on smartphones or tables. It communicates with servers using HTTP protocol, often using JSON for data body. Mobile apps combine a UI (e.g., HTML, CSS, JavaScript) alongside a backend service separate from the mobile app to provide functionality.

API

API: An API, Application Programming Interface, is a set of rules or defined contract that allow different software applications to communicate with each other.

Rules:

What endpoints exits
Defined valid inputs
Expected outputs and errors

This contract allows different systems, such as a client and server, to work together them to work together.

Request: A client/user sends a request through the API
Processing: The API forwards the request to the server
Response: The server processes the request and sends the response back to the api
Delivery: The API returns the server's response to the client

Separating Traffic and Database

Intro

As more users use an application, one server is not enough for the new amount of traffic. We need multiple servers. Lets say, one for web/mobile traffic, and the other to host the database. Separating logic from storage allows them to be scaled independently.

Diagram: Separating Traffic and Database

                                              1) ------>
               +-------------------------+   api.mysite.com     +--------+
               | Web Browser, Mobile App |                      |  DNS   |     
               +-------------------------+                      +--------+    
                        |        ^            2) <------    
                        |        |           15.125.23.214     
                        |        |       
    3)                  |        |  4)    
    15.125.23.214       |        |  HTML Page 
                        |        |    
                        |        |       
                        |        |      
                        V        |          5) ------>
                    +----------------+    read/write/update  +------------+
                    |   Web Server   |                       |  Database  |
                    +----------------+     6) <------        +------------+
                                            return data

Which Database To Use?

Intro

The type of database in a system directly impacts the performance, scalability, and how data can be queried and maintains.

To give some examples, relational databases are ideal for data with consistent shape, while non relational databases are better for dynamic schemas.

In relational databases, queries can become slow due to frequent joins to alter scheme, while non relational database are usually faster in queries since no joins are necessary.

There are more differences that we will dive into as we go.

Relational Database

Relational Database: A relational database (RDB) is a type of database that stores data in tables with rows and columns. Each table has a schema that defines its structure, and tables can be related to one another through keys (primary keys, foreign keys). Relational databases support SQL (Structured Query Language) for querying and managing data. They are ideal for structured data with well-defined relationships.

    -- Example: Relational Database Table
    CREATE TABLE Users (
        id INT PRIMARY KEY,
        firstName VARCHAR(50),
        lastName VARCHAR(50),
        email VARCHAR(100)
    );

Non-Relational Database

Non-Relational Database: A non-relational database (NoSQL) is a database that stores data in flexible formats such as key-value pairs, documents, wide-column stores, or graphs. They do not require a fixed schema and are optimized for horizontal scalability and high-performance data operations. Examples include MongoDB (document store), Redis (key-value store), and Cassandra (wide-column store).

    // Example: Non-Relational Database Document
    {
        "id": 12,
        "firstName": "Jane",
        "lastName": "Doe",
        "email": "jane@example.com"
    }

Scaling

Intro

Scaling: Determines how a system can handle increased traffic, data volume, or computation load. Different aspects of a system require different scaling approaches:

Traffic spikes: Additional servers or resources via horizontal scaling
Data growth: Database may need to be partitioned or shareded via horizontal
Geographic distribution: Serving users across multiple regions may require closer services via geographical or regional scaling
Compute heavy operations: Specific components may need more powerful servers or parallelized process vertical scaling
etc...

The type of scaling chosen depends on the problem/bottleneck the system is facing.

Vertical Scaling

Vertical Scaling: Vertical scaling or 'scaling up', involves adding resources to a single instance to handle increased load.

This could mean upgrading the CPU, RAM, or storage.

Pros:

Simple to implement with no changes to application logic, just need $
Useful for applications with limit

Cons:

Will be limited by physical capabilities aof a single machine
Will be expensive as hardware upgrade cost increases

Ex: Upgrading web server from 16GB to 64GB of RAM to handle more simultaneous users.

Horizontal Scaling

Horizontal scaling: Horizontal scaling or 'scaling out', involves adding more machines or node instances to a system to distribute load. This allows applications to handle increased traffic by spreading requests across multiple servers.

Pros:

Can scale indefinitely by adding more nodes
Improves fault tolerance and redundancy

Cons:

Requires more complex distributed architecture and load balancing
Complexity to maintain data consistency across load

Ex: Running a web application across 5 servers behind a load balancer to handle spikes in traffic.

Load Balancer: Server Tier Scaling

Intro

A load balancer assists the reliability and scalability in a system. Acting as a person in the middle between clients and web servers distributing traffic efficiently ensuring that no single server is overwhelmed.

A load balancer allows us to handle server failures, traffic spikes, and dynamic scaling without downtime.

Load Balancer

Load Balancer: A load balancer evenly distributes incoming traffic among web servers that are defined in a load balanced set or 'server pool/target group'.

When traffic arrives, the load balancer decides using a policy such as round-robin, least connection, or weighted rules, which server in the set should handle the request.

For better security, private IPs are used for communication between servers, making the web servers unreachable directly by clients.

Private IP Address: A private IP address is an IP address reserved for use inside a private network (e.g., home, office, data center). They are not routable on the public internet and can only be used by machines on the same network.

Common private ranges include:

10.0.0.0 - 10.255.255.255
172.16.0.0 - 172.31.255.255
192.168.0.0 - 192.168.255.255

Private IPs help conserve public address space and add a layer of security as external hosts cannot directly reach them without network address translation (NAT).

NAT:
Network Address Translation is a networking process that maps one set of IP addresses to another, allowing devices in a private network to communicate with external networks (e.g., public internet, external private network wrapper) using a single public IP address.

It is most commonly used by routers to let multiple devices with private IP addresses share one public IP. NAT helps conserve public address space and adds a layer of security, since internal addresses are hidden from the outside.

Reliability

If server 1 goes offline, all traffic will be routed to server 2, which prevents the website from going offline. Additionally on fail, we can add a new healthy web server to the server pool to re balance the load.

Scaling

If website traffic grows rapidly and two servers are no longer enough, we can just add more servers to the web server pool and the load balancer can automatically start sending requests to them.

    // Example: NAT translating private IPs to a public IP
    Private IPs: 192.168.1.10, 192.168.1.25, 192.168.1.30
    Public IP (router): 203.0.113.5

    // Outbound request:
    192.168.1.10:5050  ---> 203.0.113.5:5500 (mapped by NAT)

Diagram: Load Balancer

                                              1) ------>
               +-------------------------+   api.mysite.com     +--------+
               | Web Browser, Mobile App |                      |  DNS   |     
               +-------------------------+                      +--------+    
                         |        ^            2) <------    
                         |        |           15.125.23.214     
                         |        |       
    3)                   |        |  4)    
Public IP: 15.125.23.214 |        |  HTML Page 
                         |        |    
                         |        |       
                         |        |      
                         V        |
                    +------------------+
                    |  Load Balancer   |
                    +------------------+
                       /             \
Private IP: 10.0.0.1  /               \     Private IP: 10.0.0.2
                     /                 \
        +----------+                  +----------+
        | Server 1 |                  | Server 2 |
        +----------+                  +----------+

Database Replication: Data Tier Scaling

Intro

Database replication is assists the availability, reliability, and read performance of a system by creating copies of a database across multiple servers. The allows read heavy workloads to be distributed across multiple replicas and allows writes to be coordinated by a master database. The allows for fault tolerance.

Database Replication

Database Replication: Database replication is the process of copying and maintaining since database objects, such as tables, in multiple databases (replicas) to improve availability and read performance.

The databases are split into a primary and replica format where the primary handles all write operations and propagates changes to replicas, and reads are handled by the replicas.

Performance: all writes happen in master nodes, while reads are distributed across the replica nodes. This allows queries to be processed in parallel improving performance.

Reliability: If a database server is destroyed, data is still preserved as its replicated across multiple locations.

Availability: If a database goes offline, we can access data stored in another replica database server allowing website to remain operational.

Scalability: Splitting reads to replicas and writes to primaries allows us to scale easier.

Replica Failure: If we have a single replica database which goes offline, read operations will be directed to the primary temporarily until the issue is found and we create a new healthy replica. If there are multiple replicas and a single replica goes offline, we simply redirect to the other healthy database and a new replica will replace the old one.

Primary Failure: If a primary database goes offline, a replica database will be promoted as a new primary. A new replica is then created to replace the old replica. However, we run into the issue of a replica not being up to date. Replication methods like multi masters and circular replication could help, or we could run data recovery scripts.

Customization: replication can be synchronous where primary waits for replicas to confirm changes, or be asynchronous where replicas eventually catch up. The choice affects latency and consistency guarantees.

Diagram: Database Replication

                         +----------------------+
                         |      Web Server      |
                         +----------------------+
                           |                |
                           |                |
                           |                |    
            (Writes)       |                |       (Reads)
                           |                |
        - - - - - - - - - -                 - - - - - - - - - - - - - - - - - -
        |                                                         |     |     |
        |                (DB Replication)   +--------------+      |     |     |
        |                   - - - - - - - > |  Slave DB 1  | - - -      |     |
        |                   |               +--------------+            |     |
    +-------------------+   |                                           |     |
    |   Master Database | - |               +--------------+            |     |
    +-------------------+   - - - - - - - > |  Slave DB 2  | - - - - - -      |
                            |               +--------------+                  |
                            |                                                 |
                            |               +--------------+                  |
                            - - - - - - - > |  Slave DB 3  | - - - - - - - - - 
                                            +--------------+

Cache: Load and Response Time Tier

Intro

Caching is a technique to store frequently accessed data closer to the client or in a faster storage so that repeated requests can be served faster.

By reducing the number of requests to the backend severs or databases, caching improves response time, reduces load, and increases scalability.

Its particularly in read heave workloads and in scenarios where data does not change frequently.

Cache Tier

Cache: A cache temporarily stores copies of data that would otherwise be retrieved from a slower backend, such as a database or remote API.

Caches can exist at different levels:

Client Side Cache: Stored in the browser or app (e.g., local storage, service workers).

CDN Edge Cache: Cached at network edge servers to reduce latency for geographically distributed users.

Server Cache: External in memory (e.g., Redis, Memcached) controlled by the server to accelerate API responses and usually put in front of a database.

Database Cache: Some relational databases or ORMs have built in query caching

Pros:

Backend Load: Offloads repeated requests from the main servers
Response Time: Serves frequency accessed data faster
Scalability: Allows to scale cache tier independently from backend

Cons:

Inconsistency: Cache data may become outdated if source changes
Cache Invalidation: When and what to remove from cache is hard to determine, especially in distributed systems

Read_Through_Cache_Strategy():

                +-----------------+                 1.
                |   Web Server    | - - - - - - -   if data exists in cache,
                +-----------------+             |   read data from cache
                    |     |                     |
                    |     |   2.1               |
  2.                |     |   Save data         | 
  If data does not  |     |   to cache          v
  exist in cache,   |     |               +------------+
  query database    |     | - - - - - - > |    Cache   |               
                    v                     +------------+
                  +------------+ 
                  |  Database  |        
                  +------------+

Cache Considerations

When to use a cache: A cache is practical when data is read frequently but modified infrequently. Cache should be saved in persistent data stores, as opposed to volatile memory such as a cache server which may restart and lose data.

Expiration Policy: Determine a viable expiration policy, that once cached data reached, it expires and is removed from the cache.

Consistency: Keeping the database and cache in sync is challenging. Inconsistencies occur as data modifying operations on the data store and cache are not done in a single transaction. Difficulty increases when scaling across multiple regions, a good reference is 'scaling memcache at facebook'.

Mitigating Failures: A single cache server represents a single point of failure. Multiple cache servers across different data centers is recommended in addition to overprovisioning required memory by certain percentages to provide a buffer as memory usage increases.

Eviction Policy: Once a cache is full, new requests to add items will cause existing items to be removed. Least recently used (LRU) is the most popular cache eviction policy. Least frequently used (LFU), First in First Out (FIFO) can also be used to satisfy different use cases.

Content Delivery Network (CDN)

Intro

A Content Deliver Network (CDN) is a distributed network of servers that cache and deliver content closer to end users. CDNs improve performance, reduce latency, and help handle high traffic loads efficiently.

CDN

CDN: A CDN is a globally distributed network of edge servers that caches static and sometimes dynamic content. When a user requests content, the CDN serves it from the nearest edge server instead of the origin server, reducing load times and bandwidth usage.

CDNs are commonly used for data that changes less frequently:

Images, videos, and static files (CSS, JS)
API responses (with caching policies)
Web page information to accelerate delivery

Key benefits include improved response times, reduced origin server load, and better handling of geographically distributed traffic.

A CDN is essentially a public, distributed cache with nuances.

Public/Shared: Part of a global network that serves many users
Edge Caching: It stores content closer to end users geographically
Not just caching: Provides routing, DDoS protection, SSL termination, etc
Cache control: You can configure what gets cached, TTL, invalidation, etc

    +---------+
    | Client1 | - -                  +------------+
    +---------+   |                  |   Origin   |
                  |                  | (web/orig) |
                  |                  +------------+
    +---------+   |                         ^
    | Client2 | - - - - - - - - - - - - - - |
    +---------+         |               120 ms
                        |   40 ms
                        v
                +-------------+
                |     CDN     |
                | (edge cache)|
                +-------------+

CDN Considerations: Cost, CDN Failure, Cache Expiry, Cache Invalidation

Cost: CDNs are run by third party providers, so you are charged for data transfers in and out of the CDN, so cache only frequently used data.

Appropriate Cache Expiry: Cache expiration time should not be too long to avoid un fresh data or too short that reloading of content from origin to CDN is too often.

CDN Fallback: Consider how application copes with CDN failure, temporary CDN outage, in a way that clients should be able to detect the problem and start requesting resources from the origin.

Invalidation: Expiration occurs either by APIs provided by CDN vendors or by updating object versioning to server a newer version of some object.

State and Stateless Web Tier

Intro

The web tier, (the application servers that handle HTTP requests), can be designed as stateful or stateless. How state is managed directly affects scalability, fault tolerance, and complexity.

In stateful architecture, the web tier stores client session data (state) on the server itself.

In stateless architecture, the web tier does not keep client state, and each request is independent and needs to carry context (cookies, tokens, or be stored in external DB/cache)

Stateful Architecture

Stateful Architecture: The server maintains session state for each client across multiple requests.

Examples of Stateful:

Logged In Sessions stored in server memory

+----------+           +----------+           +---------+
|  User A  |           |  User B  |           | User C  |
+----------+           +----------+           +---------+
     | HTTP Req             | HTTP Req            | HTTP Req
     v                      v                     v
+--------------+      +--------------+      +-------------+
|   Server 1   |      |   Server 2   |      |   Server 3  |
+--------------+      +--------------+      +-------------+
| - Session A  |      | - Session B  |      | - Session C |
| - Profile A  |      | - Profile B  |      | - Profile C |
+--------------+      +--------------+      +-------------+

Stateless Architecture

Stateless Architecture: The server maintains no session state for each client across multiple requests.

Examples of Stateless:

Logged In Sessions stored in server memory

+---------+       +---------+       +---------+
| User A  |       | User B  |       | User C  |
+---------+       +---------+       +---------+
     |                |                |            HTTP Req
     v                v                v
                 +-------------+
                 |   Server 1  |
                 +-------------+
                       |
                       v
            +-------------------------+
            |     Shared Session DB   |
            +-------------------------+
            | - Session A / Profile A |
            | - Session B / Profile B |
            | - Session C / Profile C |
            +-------------------------+

===============================================================

+---------+       +---------+       +---------+
| User A  |       | User B  |       | User C  |
+---------+       +---------+       +---------+
        |              |              |
        |              |              |    HTTP Req with JWT / token
        |              |              |
        v              v              v
                 +-------------+
                 |   Server 1  |
                 +-------------+
                 |  Stateless  |
                 |  No session |
                 |  stored     |
                 +-------------+

Chapter 2: Back Of The Envelope Estimation

Intro

Calculations and estimates created by using combination of thought experiments and common performance in order to get a feel for which design strategies will meet the requirements for the system.

Chapter 3: A Framework For System Design Interviews

Textbook: System Design Interview Vol I