Core Resources

Appendix

You'll sometimes be asked to do 'back-of-the-envelope' estimates. For example, you might need to determine how long it will take to generate 100 image thumbnails from disk or how much memory a data structure will take. The Powers of two table and Latency numbers every programmer should know are handy references.

Powers of two table

Power Exact Value Approx Value Bytes
---------------------------------------------------------------
7 128
8 256
10 1024 1 thousand 1 KB
16 65,536 64 KB
20 1,048,576 1 million 1 MB
30 1,073,741,824 1 billion 1 GB
32 4,294,967,296 4 GB
40 1,099,511,627,776 1 trillion 1 TB

Source(s) and further reading

Latency numbers every programmer should know

Latency Comparison Numbers
--------------------------
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy 10,000 ns 10 us
Send 1 KB bytes over 1 Gbps network 10,000 ns 10 us
Read 4 KB randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD
Read 1 MB sequentially from memory 250,000 ns 250 us
Round trip within same datacenter 500,000 ns 500 us
Read 1 MB sequentially from SSD* 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD, 4X memory
HDD seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip
Read 1 MB sequentially from 1 Gbps 10,000,000 ns 10,000 us 10 ms 40x memory, 10X SSD
Read 1 MB sequentially from HDD 30,000,000 ns 30,000 us 30 ms 120x memory, 30X SSD
Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms
Notes
-----
1 ns = 10^-9 seconds
1 us = 10^-6 seconds = 1,000 ns
1 ms = 10^-3 seconds = 1,000 us = 1,000,000 ns

Handy metrics based on numbers above:

  • Read sequentially from HDD at 30 MB/s
  • Read sequentially from 1 Gbps Ethernet at 100 MB/s
  • Read sequentially from SSD at 1 GB/s
  • Read sequentially from main memory at 4 GB/s
  • 6-7 world-wide round trips per second
  • 2,000 round trips per second within a data center

Latency numbers visualized

Source(s) and further reading

Additional system design interview questions

Common system design interview questions, with links to resources on how to solve each.

QuestionReference(s)
Design a file sync service like Dropboxyoutube.com
Design a search engine like Googlequeue.acm.org
stackexchange.com
ardendertat.com
stanford.edu
Design a scalable web crawler like Googlequora.com
Design Google docscode.google.com
neil.fraser.name
Design a key-value store like Redisslideshare.net
Design a cache system like Memcachedslideshare.net
Design a recommendation system like Amazon'shulu.com
ijcai13.org
Design a tinyurl system like Bitlyn00tc0d3r.blogspot.com
Design a chat app like WhatsApphighscalability.com
Design a picture sharing system like Instagramhighscalability.com
highscalability.com
Design the Facebook news feed functionquora.com
quora.com
slideshare.net
Design the Facebook timeline functionfacebook.com
highscalability.com
Design the Facebook chat functionerlang-factory.com
facebook.com
Design a graph search function like Facebook'sfacebook.com
facebook.com
facebook.com
Design a content delivery network like CloudFlarefigshare.com
Design a trending topic system like Twitter'smichael-noll.com
snikolov .wordpress.com
Design a random ID generation systemblog.twitter.com
github.com
Return the top k requests during a time intervalcs.ucsb.edu
wpi.edu
Design a system that serves data from multiple data centershighscalability.com
Design an online multiplayer card gameindieflashblog.com
buildnewgames.com
Design a garbage collection systemstuffwithstuff.com
washington.edu
Design an API rate limiterhttps://stripe.com/blog/
Design a Stock Exchange (like NASDAQ or Binance)Jane Street
Golang Implementation
Go Implemenation
Add a system design questionContribute

Real world architectures

Articles on how real world systems are designed.

alt text

Source: Twitter timelines at scale

Don't focus on nitty gritty details for the following articles, instead:

  • Identify shared principles, common technologies, and patterns within these articles
  • Study what problems are solved by each component, where it works, where it doesn't
  • Review the lessons learned
TypeSystemReference(s)
Data processingMapReduce - Distributed data processing from Googleresearch.google.com
Data processingSpark - Distributed data processing from Databricksslideshare.net
Data processingStorm - Distributed data processing from Twitterslideshare.net
Data storeBigtable - Distributed column-oriented database from Googleharvard.edu
Data storeHBase - Open source implementation of Bigtableslideshare.net
Data storeCassandra - Distributed column-oriented database from Facebookslideshare.net
Data storeDynamoDB - Document-oriented database from Amazonharvard.edu
Data storeMongoDB - Document-oriented databaseslideshare.net
Data storeSpanner - Globally-distributed database from Googleresearch.google.com
Data storeMemcached - Distributed memory caching systemslideshare.net
Data storeRedis - Distributed memory caching system with persistence and value typesslideshare.net
File systemGoogle File System (GFS) - Distributed file systemresearch.google.com
File systemHadoop File System (HDFS) - Open source implementation of GFSapache.org
MiscChubby - Lock service for loosely-coupled distributed systems from Googleresearch.google.com
MiscDapper - Distributed systems tracing infrastructureresearch.google.com
MiscKafka - Pub/sub message queue from LinkedInslideshare.net
MiscZookeeper - Centralized infrastructure and services enabling synchronizationslideshare.net
Add an architectureContribute

Company architectures

CompanyReference(s)
AmazonAmazon architecture
CinchcastProducing 1,500 hours of audio every day
DataSiftRealtime datamining At 120,000 tweets per second
DropBoxHow we've scaled Dropbox
ESPNOperating At 100,000 duh nuh nuhs per second
GoogleGoogle architecture
Instagram14 million users, terabytes of photos
What powers Instagram
Justin.tvJustin.Tv's live video broadcasting architecture
FacebookScaling memcached at Facebook
TAO: Facebook’s distributed data store for the social graph
Facebook’s photo storage
How Facebook Live Streams To 800,000 Simultaneous Viewers
FlickrFlickr architecture
MailboxFrom 0 to one million users in 6 weeks
NetflixA 360 Degree View Of The Entire Netflix Stack
Netflix: What Happens When You Press Play?
PinterestFrom 0 To 10s of billions of page views a month
18 million visitors, 10x growth, 12 employees
Playfish50 million monthly users and growing
PlentyOfFishPlentyOfFish architecture
SalesforceHow they handle 1.3 billion transactions a day
Stack OverflowStack Overflow architecture
TripAdvisor40M visitors, 200M dynamic page views, 30TB data
Tumblr15 billion page views a month
TwitterMaking Twitter 10000 percent faster
Storing 250 million tweets a day using MySQL
150M active users, 300K QPS, a 22 MB/S firehose
Timelines at scale
Big and small data at Twitter
Operations at Twitter: scaling beyond 100 million users
How Twitter Handles 3,000 Images Per Second
UberHow Uber scales their real-time market platform
Lessons Learned From Scaling Uber To 2000 Engineers, 1000 Services, And 8000 Git Repositories
WhatsAppThe WhatsApp architecture Facebook bought for $19 billion
YouTubeYouTube scalability
YouTube architecture

Company engineering blogs

Architectures for companies you are interviewing with.

Questions you encounter might be from the same domain.

Source(s) and further reading

Looking to add a blog? To avoid duplicating work, consider adding your company blog to the following repo: