My Solution for Design Yelp or Nearby Friends with Score: 9/10

by utopia4715

System requirements


Functional:

- User can find businesses around their location.

  • User can use keywords, category, price range to refine search.

- User can give address to find businsses.

- User can write reviews on the businesses.

- User can read reviews on the businesses.



Non-Functional:

- Response Time under 2s on search

- Scalable: 100M DAU, whole US

- Eventual consistency


Assumptions:

- I assume business data are already in the 

database. 


Capacity estimation

Numbers: 

- 10M businesses

- For each business, 1KB of medadata

- 10GB of data. Grows into 100GB in 2 years. 

- Reviews: 100KB per business

- 1TB of data. Grows into 10TB in 2 years. 




API design

API: 

- search(user_id, location, criteria, keyword)

 -> returns JSON list of businesses

- read_review(user_id, business_id, start, end)

- write_review(user_id, score, review_text)




Database design

Document DB like Mongo DB would be suitable because it has secondary indices (unlike some other NoSQL databases), and more horizontally scalable than relational databases.


Business doc:

- business_id

- location

- area

- type

- price

- review_ids [normalize or denormalize?]


Review doc:

- timestamp

- business_id

- user_id

- score

- review_text


High-level design

See diagram





Request flows

Request from Client are forwarded to an appropriete service by API Gateway, according to the request URI.


Search Service handles search requests, using businesses data in Document DB and Cache.


Review Service writes reviews to Document DB. Message Queue is in between the service and the DB to absort sudden burst of writes.


Text Search Service, like Elastic Search, provides keyword search capability.



Detailed component design

search algorithm

- Search Service figures out a user's area using location.

- It looks up businesses using area, category, price range.

- We use secondary indices, which are supported by MongoDB.

- Then, it does more fine grained search for the nearest businesses, using locations.

- It then further refine the results using text based search of Elastic Search.



Trade offs/Tech choices


For the data store, Document DB like Mongo DB would be suitable because:


  • It is more horizontally scalable than relational databases.
  • Schema is more flexible than relational database, allowing us to pick the schema to optimize query performance.
  • it has secondary indices to optimize query performance. This is an advantage over Log Structured Merge based NoSQL databases such as Cassandra, Hbase.



Failure scenarios/bottlenecks

- User is on the edge of an area. -> Need to search user's own area + neighboring area. 

- One area becomes so dense with businesses. -> Needed to divide the area further. Consistent hashing to allow re-partitioning by area.

- Burst of reviews (writes) to a particular business. -> Message Queue

- More scale in general -> Horizontally scale all components. Partition data using area.  


*** Maintainability

- Monitor all components.

- Alerts on response time too slow, server resource issues, server crashes.

- Fault tolerance. Back up business data 3 ways (same data center, same region, different region for disaster recovery). 

- Services are stateless. Duplicate them and have an automatic recovery. Cache, DB, message queue should have the same.





Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?