System requirements


Functional:

- users can post, edit, delete tweets

- other users can like, reply, and repost

- users can add friends

- users should be able to block people as well

- maybe be able to post media too?



Non-Functional:

Performance:

latency: want to be low

high concurrency

Availability:

- availibility > consistency

- want service available 99.99%

- NoSQL database: availability, eventual consistency


Scalability:

- want good load balancing, esp as load increases

- document model DB like Mongo for tweets and media,

- graph database for conveying friends and blocked


Security:

- allow users to stay logged in on their devices

- allow users to reset password if they forgot

- allow users to delete account



Capacity estimation

- 1 million active users daily

- 1000 bytes a tweet

- 10 tweets a day

- 10,000,000,000 bytes / day == 10000 MB / day == 10 GB / day * 365 = 3650 GB / year

== 3.65 TB / year * 5 = 18.25 TB / 5 years





API design

/createTweet

(userId: UUID, message: string, media: obj)

for each elem in media array

upload elem to S3 bucket (or similar cloud service)

if unsuccessful, return error

else, take the URL from cloud

create a new UUID for the photo / video

add new document to photos or videos document respectively

create new tweet document

save tweet document into tweet collection DB

return error if unsuccessful, else return 200


/updateTweet

(userId: UUID, newMessage: string, tweetId)

check if tweetId exists in tweet collection

if not, return error

else, update text in the respective tweet document

return error if unsuccessful, else 200


/deleteTweet

(userId: UUID, newMessage: string, tweetId)

check if tweetId exists in DB

if not, return error

else, check if this tweet has photos and videos

if yes, delete from cloud

if successful, finally delete tweet from tweet collection

return error if unsuccessful, else 200


/repostTweet

(userId: UUID, tweetId: UUID)

check if tweetId exists in DB

if not, return error

else, check if this tweet has photos and videos

if yes, save a copy of each for this new user

if successful, finally create a new tweet for this user in the tweet collection

return error if unsuccessful, else 200


/addFriend

(userId, friendId)

see if there exists a document in friends collection

if yes, return error saying already friends

else, create a new document

return error if unsuccessful, else 200


/removeFriend

(userId, friendId)

see if there exists a document in friends collection

if no, return error saying not friends

else, delete the existing document

return error if unsuccessful, else 200


/blockUser

(userId, friendId)

see if there exists a document in blocked collection

if yes, return error saying already blocked

else, create new document in blocked collection

return error if unsuccessful, else 200


/uploadPhoto


/uploadVideo


/deletePhoto


/deleteVideo





Database design


users

userId UUID

firstName string

lastName string


tweet

tweetId UUID

userId UUID

text string

photos arrayOfPhotoUUIDs

videos arrayOfVideoUUIDs


photos

photoId UUID

userId UUID

photoUrl string

uploadDate datetime


videos

videoId UUID

userId UUID

videoUrl string

uploadDate datetime


friends

connectionId UUID

userA UUID

userB UUID


blocked

userId UUID

blocked arrayOfUserUUIDs


sessions

userId UUID

ipAddress string

lastOnline datetime

expires datetime





High-level design

You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...







Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...






Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...






Trade offs/Tech choices

cache:

choosing write-behind

add/update entry to cache

async add to DB


vs


write-through

add/update entry to cache

cache adds/updates entry to DB

data is synchronous

however, cache becomes the point of failure


vs


cache-aside

check cache for result

check DB if not in cache

add result to cache

send response to client


each cache miss is delay, ruin UX


LB:

- IP hash: LB chooses which server based on IP hash


vs


Round robin:

traffic is distributed to each server in a cycle

this does not take into account the current load on a server when giving a new task; uneven distribution of traffic


vs


least-traffic

LB choose server with least traffic

more complexity, does not guarantee that connected server is closest to client; may add latency


DB:

primary-replica replication:

one DB for read and write

the replicas are just for writing

vs


primary-primary

all are read and write

worker queue will need to know all IPs, which adds complexity

increased latency by making sure data is consistent between all primary DBs

sharding - based on geolocation

each DB is sharded based on geolocation

reduce latency to clients


Failure scenarios/bottlenecks

I think the cache is a failure scenario. If the cache goes down before storing the transaction onto the worker queue, there would be lost data. I think we can remedy this by having backup caches. We can implement active-passive failover, where the backup can take over for the primary cache should the primary go down.


The worker queue is another failure scenario. If that goes down before saving into the MongoDB databases, there is data lost.


Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?