System requirements
Functional:
- users can post, edit, delete tweets
- other users can like, reply, and repost
- users can add friends
- users should be able to block people as well
- maybe be able to post media too?
Non-Functional:
Performance:
latency: want to be low
high concurrency
Availability:
- availibility > consistency
- want service available 99.99%
- NoSQL database: availability, eventual consistency
Scalability:
- want good load balancing, esp as load increases
- document model DB like Mongo for tweets and media,
- graph database for conveying friends and blocked
Security:
- allow users to stay logged in on their devices
- allow users to reset password if they forgot
- allow users to delete account
Capacity estimation
- 1 million active users daily
- 1000 bytes a tweet
- 10 tweets a day
- 10,000,000,000 bytes / day == 10000 MB / day == 10 GB / day * 365 = 3650 GB / year
== 3.65 TB / year * 5 = 18.25 TB / 5 years
API design
/createTweet
(userId: UUID, message: string, media: obj)
for each elem in media array
upload elem to S3 bucket (or similar cloud service)
if unsuccessful, return error
else, take the URL from cloud
create a new UUID for the photo / video
add new document to photos or videos document respectively
create new tweet document
save tweet document into tweet collection DB
return error if unsuccessful, else return 200
/updateTweet
(userId: UUID, newMessage: string, tweetId)
check if tweetId exists in tweet collection
if not, return error
else, update text in the respective tweet document
return error if unsuccessful, else 200
/deleteTweet
(userId: UUID, newMessage: string, tweetId)
check if tweetId exists in DB
if not, return error
else, check if this tweet has photos and videos
if yes, delete from cloud
if successful, finally delete tweet from tweet collection
return error if unsuccessful, else 200
/repostTweet
(userId: UUID, tweetId: UUID)
check if tweetId exists in DB
if not, return error
else, check if this tweet has photos and videos
if yes, save a copy of each for this new user
if successful, finally create a new tweet for this user in the tweet collection
return error if unsuccessful, else 200
/addFriend
(userId, friendId)
see if there exists a document in friends collection
if yes, return error saying already friends
else, create a new document
return error if unsuccessful, else 200
/removeFriend
(userId, friendId)
see if there exists a document in friends collection
if no, return error saying not friends
else, delete the existing document
return error if unsuccessful, else 200
/blockUser
(userId, friendId)
see if there exists a document in blocked collection
if yes, return error saying already blocked
else, create new document in blocked collection
return error if unsuccessful, else 200
/uploadPhoto
/uploadVideo
/deletePhoto
/deleteVideo
Database design
users
userId UUID
firstName string
lastName string
tweet
tweetId UUID
userId UUID
text string
photos arrayOfPhotoUUIDs
videos arrayOfVideoUUIDs
photos
photoId UUID
userId UUID
photoUrl string
uploadDate datetime
videos
videoId UUID
userId UUID
videoUrl string
uploadDate datetime
friends
connectionId UUID
userA UUID
userB UUID
blocked
userId UUID
blocked arrayOfUserUUIDs
sessions
userId UUID
ipAddress string
lastOnline datetime
expires datetime
High-level design
You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
Trade offs/Tech choices
cache:
choosing write-behind
add/update entry to cache
async add to DB
vs
write-through
add/update entry to cache
cache adds/updates entry to DB
data is synchronous
however, cache becomes the point of failure
vs
cache-aside
check cache for result
check DB if not in cache
add result to cache
send response to client
each cache miss is delay, ruin UX
LB:
- IP hash: LB chooses which server based on IP hash
vs
Round robin:
traffic is distributed to each server in a cycle
this does not take into account the current load on a server when giving a new task; uneven distribution of traffic
vs
least-traffic
LB choose server with least traffic
more complexity, does not guarantee that connected server is closest to client; may add latency
DB:
primary-replica replication:
one DB for read and write
the replicas are just for writing
vs
primary-primary
all are read and write
worker queue will need to know all IPs, which adds complexity
increased latency by making sure data is consistent between all primary DBs
sharding - based on geolocation
each DB is sharded based on geolocation
reduce latency to clients
Failure scenarios/bottlenecks
I think the cache is a failure scenario. If the cache goes down before storing the transaction onto the worker queue, there would be lost data. I think we can remedy this by having backup caches. We can implement active-passive failover, where the backup can take over for the primary cache should the primary go down.
The worker queue is another failure scenario. If that goes down before saving into the MongoDB databases, there is data lost.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?