3 min read

System Design — How Do Big Things Work?

Every software engineer encounters system design at some point. It is the discipline of building systems that handle large scale, high traffic, and complex data flows. This article introduces the core concepts through concrete examples.

Key sources: "Designing Data-Intensive Applications" by Martin Kleppmann, "System Design Interview" by Alex Xu.


What Is System Design?

System design is the process of defining the architecture, components, modules, and data flow of a system to satisfy specific requirements. It answers questions like:

  • How does Netflix serve millions of viewers simultaneously?
  • How does WhatsApp deliver messages instantly across continents?
  • How does Uber match riders with nearby drivers in real time?

These are not abstract problems. They have concrete solutions that follow reusable patterns.


Why System Design Matters

A simple web app serving 100 users can run on a single server. The same app serving 10 million users requires a fundamentally different architecture. System design is what bridges that gap.

Consider a photo-sharing app:

  • At 100 users: Store photos on the server's hard drive. One database. One web server. Done.
  • At 1 million users: Photos need object storage (S3). Multiple web servers behind a load balancer. A database cluster with read replicas. A CDN for serving images globally. A caching layer (Redis) for hot content.
  • At 100 million users: Geographic sharding. Content delivery optimization. Video transcoding pipelines. Recommendation models. Message queues for async processing.

Each scale introduces new problems. System design provides the patterns to solve them.


The Building Blocks

Every large system is composed of a few fundamental building blocks:

Load Balancers

A load balancer distributes incoming traffic across multiple servers. No single server gets overwhelmed. If one server fails, traffic is redirected to healthy ones.

Real example: An e-commerce site during a flash sale. Without a load balancer, one server gets all traffic and crashes. With one, traffic is spread across 20 servers, and the site stays up.

Caching

Caching stores frequently accessed data in fast memory (RAM or SSD) instead of fetching it from a slow database. The first request is slow. Subsequent requests are fast.

Real example: Twitter trending topics. Millions of users see the same trending list. Instead of computing it from scratch for every request, Twitter caches the result for 60 seconds. One computation serves millions of requests.

Databases

Databases store data persistently. The choice depends on the data model:

  • Relational databases (PostgreSQL, MySQL): Structured data with relationships. Good for transactions, user accounts, financial records.
  • NoSQL databases (MongoDB, Cassandra): Flexible schemas, horizontal scaling. Good for product catalogs, event logs, user sessions.
  • In-memory stores (Redis): Blazing fast key-value access. Good for caching, real-time counters, session management.

Message Queues

Message queues enable asynchronous communication between services. A producer sends a message. A consumer processes it later. This decouples services and absorbs traffic spikes.

Real example: When you upload a video to YouTube, you get an immediate response. Behind the scenes, a message queue sends the video to a transcoding worker, a thumbnail generator, and a content moderation pipeline — all running independently.

CDNs (Content Delivery Networks)

A CDN caches static content (images, videos, CSS, JavaScript) at edge servers worldwide. Users download content from the closest server instead of a central one.

Real example: Streaming a Netflix show from Singapore while the origin server is in California. Without a CDN, every byte crosses the Pacific Ocean. With one, the content is cached in a Singaporean edge server, and playback starts in seconds.


A Simple Example: URL Shortener

Let us design a URL shortener (like bit.ly) to see these patterns in action.

Requirements: - Accept a long URL and return a short alias - Redirect short URLs to the original - Handle 100 million URL creations per month

Architecture:

  1. Web server receives the long URL and generates a unique short code (e.g., using Base62 encoding on an auto-incrementing ID)
  2. Database stores the mapping of short code to long URL
  3. Cache (Redis) stores recently accessed mappings so frequent URLs are served without database queries
  4. Load balancer distributes traffic across multiple web servers
  5. For analytics, message queue sends click events to a separate analytics service without slowing down the redirect

The key insight: the system is simple at its core. The complexity comes from scale.


The Design Process

When approaching a system design question, follow this process:

  1. Clarify requirements. What are you building? How many users? What are the traffic patterns?
  2. Estimate scale. How much data? How many requests per second? How much storage?
  3. Design the data model. What entities exist? How are they related?
  4. Design the high-level architecture. What components are needed? How do they communicate?
  5. Identify bottlenecks. What will break first under load?
  6. Add optimizations. Caching, replication, sharding, CDNs.

Key Takeaways

  1. System design is about making trade-offs, not finding perfect solutions.
  2. Start simple. Add complexity only when the scale demands it.
  3. Every large system uses the same few building blocks: load balancers, caches, databases, queues, CDNs.
  4. The best design is the one that meets requirements with the least complexity.

For your next system design discussion: Start by asking "What scale are we designing for?" The answer determines every subsequent decision.