My Solution for Design Yelp or Nearby Friends with Score: 9/10
by utopia4715
System requirements
Functional:
- User can find businesses around their location.
- User can use keywords, category, price range to refine search.
- User can give address to find businsses.
- User can write reviews on the businesses.
- User can read reviews on the businesses.
Non-Functional:
- Response Time under 2s on search
- Scalable: 100M DAU, whole US
- Eventual consistency
Assumptions:
- I assume business data are already in the
database.
Capacity estimation
Numbers:
- 10M businesses
- For each business, 1KB of medadata
- 10GB of data. Grows into 100GB in 2 years.
- Reviews: 100KB per business
- 1TB of data. Grows into 10TB in 2 years.
API design
API:
- search(user_id, location, criteria, keyword)
-> returns JSON list of businesses
- read_review(user_id, business_id, start, end)
- write_review(user_id, score, review_text)
Database design
Document DB like Mongo DB would be suitable because it has secondary indices (unlike some other NoSQL databases), and more horizontally scalable than relational databases.
Business doc:
- business_id
- location
- area
- type
- price
- review_ids [normalize or denormalize?]
Review doc:
- timestamp
- business_id
- user_id
- score
- review_text
High-level design
See diagram
Request flows
Request from Client are forwarded to an appropriete service by API Gateway, according to the request URI.
Search Service handles search requests, using businesses data in Document DB and Cache.
Review Service writes reviews to Document DB. Message Queue is in between the service and the DB to absort sudden burst of writes.
Text Search Service, like Elastic Search, provides keyword search capability.
Detailed component design
search algorithm
- Search Service figures out a user's area using location.
- It looks up businesses using area, category, price range.
- We use secondary indices, which are supported by MongoDB.
- Then, it does more fine grained search for the nearest businesses, using locations.
- It then further refine the results using text based search of Elastic Search.
Trade offs/Tech choices
For the data store, Document DB like Mongo DB would be suitable because:
- It is more horizontally scalable than relational databases.
- Schema is more flexible than relational database, allowing us to pick the schema to optimize query performance.
- it has secondary indices to optimize query performance. This is an advantage over Log Structured Merge based NoSQL databases such as Cassandra, Hbase.
Failure scenarios/bottlenecks
- User is on the edge of an area. -> Need to search user's own area + neighboring area.
- One area becomes so dense with businesses. -> Needed to divide the area further. Consistent hashing to allow re-partitioning by area.
- Burst of reviews (writes) to a particular business. -> Message Queue
- More scale in general -> Horizontally scale all components. Partition data using area.
*** Maintainability
- Monitor all components.
- Alerts on response time too slow, server resource issues, server crashes.
- Fault tolerance. Back up business data 3 ways (same data center, same region, different region for disaster recovery).
- Services are stateless. Duplicate them and have an automatic recovery. Cache, DB, message queue should have the same.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?