My Solution for Design Yelp or Nearby Friends with Score: 8/10
by iridescent_luminous693
System requirements
Functional Requirements
- User Management
- Allow users to sign up, log in, and manage profiles.
- Enable users to save favorite locations and preferences.
- Business Listings
- Display information for local establishments (e.g., name, category, address, contact, opening hours).
- Allow users to search, filter, and sort businesses by location, category, and rating.
- Support business owners to claim and update their listings.
- Review System
- Enable users to write, edit, and delete reviews for businesses.
- Allow users to rate businesses using a standardized system (e.g., stars).
- Display aggregated ratings and reviews for each business.
- Search and Recommendation
- Provide search functionality with filters (e.g., price range, proximity, popularity).
- Offer personalized recommendations based on user history and preferences.
- Photo Upload
- Allow users to upload photos for businesses as part of their reviews.
- Display user-uploaded photos on business profiles.
- Location-Based Services
- Enable users to search for nearby businesses using geolocation.
- Show businesses on an interactive map.
- Analytics and Insights
- Display analytics to business owners (e.g., views, clicks, and reviews).
- Provide administrators with system-wide statistics.
- Notifications
- Notify users of responses to their reviews or updates on businesses they follow.
- Inform business owners of new reviews or changes in their listing.
- Moderation
- Implement mechanisms for flagging inappropriate reviews or photos.
- Provide admins tools to review and act on reported content.
- Offline Access
- Allow users to save businesses or reviews for offline access.
Non-Functional Requirements
- Performance
- Support low-latency responses for search and listing queries (<200ms).
- Handle thousands of concurrent users during peak traffic.
- Scalability
- Ensure horizontal scalability to handle increasing users and reviews.
- Use sharded databases and distributed search engines.
- Availability
- Provide 99.9% uptime with failover and redundancy mechanisms.
- Security
- Protect user data with encryption (at rest and in transit).
- Prevent spam and fraud with rate limiting and CAPTCHA.
- Usability
- Offer an intuitive user interface across web and mobile platforms.
- Support multiple languages and regional preferences.
- Data Consistency
- Ensure strong consistency for user-submitted reviews and ratings.
- Support eventual consistency for search indices and recommendations.
- Maintainability
- Use modular architecture for easier updates and feature additions.
- Provide detailed logging and monitoring for debugging.
- Privacy
- Comply with privacy regulations like GDPR.
- Allow users to manage and delete their reviews or personal data.
- Extensibility
- Enable integration with third-party APIs for reservations or promotions.
- Support APIs for developers to build extensions or analytics.
- Search Optimization
- Use a search engine (e.g., Elasticsearch) to handle text and geospatial queries.
- Provide fast autocomplete and relevance-based ranking.
Capacity estimation
1. User Base
- Monthly Active Users (MAU): ~10 million.
- Daily Active Users (DAU): ~1 million.
- Concurrent Users: Assume 1% of DAU active at the same time → ~10,000 concurrent users.
2. Business Listings
- Total Listings: ~1 million businesses.
- Average Reviews/Business: 100 reviews/business.
- Photos/Business: 50 photos/business.
- Total Data Size:
- Reviews: 1 million * 100 reviews * 1 KB/review ≈ 100 GB.
- Photos: 1 million * 50 photos * 500 KB/photo ≈ 25 TB.
3. Reviews
- Daily Reviews: ~1 million users, 5 reviews/day → ~5 million reviews/day.
- Storage Requirements:
- 5 million reviews/day * 1 KB/review = ~5 GB/day → ~1.8 TB/year.
4. Photos
- Daily Photo Uploads: ~2 million photos/day.
- Storage Requirements:
- 2 million photos/day * 500 KB/photo = ~1 TB/day → ~365 TB/year.
5. Search Queries
- Search Volume: ~10 searches/user/day.
- Total Queries/Day: 1 million users * 10 searches = 10 million queries/day.
- Peak Queries/Second: ~10,000 QPS during peak hours.
6. Notifications
- Notifications:
- ~5 million daily reviews generate ~5 million notifications.
- Assume 10% users enable push notifications = ~500,000 notifications/day.
7. Database Size
- User Data:
- 10 million users * 10 KB/user = 100 GB.
- Business Data:
- 1 million businesses * 50 KB/business = 50 GB.
- Search Index:
- 1 million businesses + 100 million reviews ≈ ~10 GB.
8. Bandwidth
- Review and Photo Uploads:
- ~2 TB/day for photos + ~5 GB/day for reviews → ~2.005 TB/day.
- Search Requests:
- 10 million queries/day * ~5 KB/query = ~50 GB/day.
- Notifications:
- 500,000 notifications * 1 KB = ~500 MB/day.
9. Compute Requirements
- API Gateway:
- Handle ~10,000 concurrent requests.
- Search Service:
- Serve ~10,000 QPS with low latency (<200 ms).
- Photo Service:
- Upload/download ~2 TB/day.
Summary
- Storage:
- Total: ~500 TB/year (including photos, reviews, and metadata).
- Traffic:
- ~2 TB/day for photos + 100 GB/day for other activities.
- Compute:
- ~10,000 concurrent users, 10,000 QPS for search, scalable to higher peaks.
API design
1. User Management APIs
- POST /user/signup
- Registers a new user with email, password, and optional profile details.
- POST /user/login
- Authenticates a user and returns an access token.
- GET /user/profile
- Fetches the user’s profile details.
- PUT /user/profile
- Updates the user’s profile information.
- DELETE /user/account
- Deletes the user’s account.
2. Business Listing APIs
- GET /business/search
- Search businesses with filters (e.g., location, category, rating).
- GET /business/{id}
- Fetch details for a specific business.
- POST /business/claim
- Allows business owners to claim their business listing.
- POST /business/create
- Adds a new business listing (admin or verified owner only).
- PUT /business/{id}
- Updates business details (admin or verified owner only).
3. Review APIs
- POST /review
- Submit a new review for a business.
- GET /business/{id}/reviews
- Fetch reviews for a specific business.
- PUT /review/{id}
- Edit an existing review.
- DELETE /review/{id}
- Remove a review.
4. Photo APIs
- POST /photo
- Upload a photo for a specific business.
- GET /business/{id}/photos
- Retrieve all photos for a business.
- DELETE /photo/{id}
- Remove a photo (admin or uploader only).
5. Search and Recommendation APIs
- GET /search/autocomplete
- Fetch autocomplete suggestions for search queries.
- GET /search/popular
- Fetch popular searches or trending locations.
- GET /recommendations
- Fetch personalized recommendations for the user.
6. Notification APIs
- GET /notifications
- Retrieve user notifications (e.g., new reviews, responses).
- POST /notifications/mark-read
- Mark a notification as read.
- POST /notifications/flag
- Report inappropriate notifications or responses.
7. Moderation APIs
- POST /moderation/flag
- Flag inappropriate reviews or photos.
- GET /moderation/queue
- Fetch flagged content for admin review.
- PUT /moderation/{id}
- Approve or reject flagged content.
8. Analytics APIs
- GET /business/{id}/analytics
- Provide analytics for a specific business (views, clicks, reviews).
- GET /admin/analytics
- Platform-wide statistics for administrators.
9. Miscellaneous APIs
- GET /location/nearby
- Fetch nearby businesses using geolocation.
- GET /categories
- Retrieve a list of supported business categories.
- GET /offline/download
- Download offline data for a region.
Database design
1. User Database
- Attributes Stored and Keys:
users
table:- Attributes:
user_id (PK)
,name
,email (unique)
,password_hash
,preferences
,created_at
. - Primary Key (PK):
user_id
. - No Foreign Keys (FK).
- Attributes:
- Purpose:
- Manage user profiles, authentication details, and preferences.
- Tech Used:
- PostgreSQL
- Reason:
- Relational model ensures strong consistency and supports complex queries for user data.
2. Business Database
- Attributes Stored and Keys:
businesses
table:- Attributes:
business_id (PK)
,name
,category
,location
,contact_info
,claimed_by (FK)
. - Primary Key (PK):
business_id
. - Foreign Key (FK):
claimed_by
referencesusers.user_id
.
- Attributes:
- Purpose:
- Store business listings and metadata for search and filtering.
- Tech Used:
- MySQL
- Reason:
- Optimized for structured data with frequent queries, sorting, and filtering.
3. Review and Rating Database
- Attributes Stored and Keys:
reviews
table:- Attributes:
review_id (PK)
,business_id (FK)
,user_id (FK)
,rating
,review_text
,created_at
,updated_at
. - Primary Key (PK):
review_id
. - Foreign Keys (FK):
business_id
referencesbusinesses.business_id
.user_id
referencesusers.user_id
.
- Attributes:
- Purpose:
- Manage user-submitted reviews, ratings, and associated metadata.
- Tech Used:
- MongoDB
- Reason:
- Flexible schema for dynamic and diverse review content.
4. Photo Storage
- Attributes Stored and Keys:
photos
table:- Attributes:
photo_id (PK)
,business_id (FK)
,uploader_id (FK)
,photo_url
,created_at
. - Primary Key (PK):
photo_id
. - Foreign Keys (FK):
business_id
referencesbusinesses.business_id
.uploader_id
referencesusers.user_id
.
- Attributes:
- Purpose:
- Store metadata of user-uploaded photos for businesses.
- Tech Used:
- AWS S3 (photo storage) and PostgreSQL (metadata).
- Reason:
- Object storage for scalable photo handling, relational DB for efficient metadata queries.
5. Search Index Database
- Attributes Stored and Keys:
search_index
:- Attributes:
business_id
,name
,category
,location
,keywords
. - No Primary or Foreign Keys as it is an index.
- Attributes:
- Purpose:
- Support full-text and geospatial search for businesses and reviews.
- Tech Used:
- Elasticsearch
- Reason:
- Optimized for fast, relevance-based search and proximity queries.
6. Analytics Database
- Attributes Stored and Keys:
business_analytics
table:- Attributes:
business_id (PK)
,views
,clicks
,reviews_count
,timestamp
. - Primary Key (PK):
business_id
.
- Attributes:
platform_analytics
table:- Attributes:
metric_id (PK)
,active_users
,reviews_posted
,photos_uploaded
,timestamp
. - Primary Key (PK):
metric_id
.
- Attributes:
- Purpose:
- Track business performance and system-wide metrics for analysis.
- Tech Used:
- Amazon Redshift or Google BigQuery
- Reason:
- Optimized for OLAP queries and large-scale analytics.
High-level design
1. User Interface (UI)
- Overview:
- Web and mobile applications that users interact with to explore businesses, write reviews, and view recommendations.
- Features:
- User-friendly search, filters, and business browsing.
- Features for adding reviews, uploading photos, and viewing analytics (for business owners).
2. API Gateway
- Overview:
- Acts as a single entry point for all client requests, routing them to the appropriate services.
- Features:
- Authentication, rate limiting, request validation, and routing.
- Centralized monitoring and logging for all requests.
3. User Management Service
- Overview:
- Handles user registration, login, profile management, and authentication.
- Features:
- Password hashing, token management, and session handling.
4. Business Management Service
- Overview:
- Manages business listings, categories, and ownership claims.
- Features:
- CRUD operations for businesses, owner verification, and updates.
5. Review Management Service
- Overview:
- Handles user reviews and ratings for businesses.
- Features:
- CRUD operations for reviews, flagging inappropriate content, and aggregating ratings.
6. Photo Management Service
- Overview:
- Manages uploading, storing, and retrieving photos associated with businesses.
- Features:
- Processes metadata and integrates with a scalable storage service (e.g., AWS S3).
7. Search Service
- Overview:
- Enables users to search for businesses, reviews, and categories.
- Features:
- Full-text search, geospatial queries, and autocomplete functionality.
8. Recommendation Service
- Overview:
- Provides personalized business and review recommendations for users.
- Features:
- Uses machine learning models to analyze user behavior and preferences.
9. Notification Service
- Overview:
- Sends updates and alerts to users and business owners about reviews, responses, and flagged content.
- Features:
- Push notifications, emails, and real-time alerts.
10. Moderation Service
- Overview:
- Handles flagged reviews and photos for inappropriate content.
- Features:
- Enables admins to review and act on flagged content.
11. Analytics and Reporting Service
- Overview:
- Aggregates user interaction data and generates insights for business owners and admins.
- Features:
- Tracks views, clicks, and reviews; generates platform-wide statistics.
12. Search Index Database
- Overview:
- Stores indexed data for fast text and geospatial searches.
- Features:
- Optimized for proximity-based and relevance-based queries.
13. Offline Access Service
- Overview:
- Allows users to download and access business information offline.
- Features:
- Provides cached business details, reviews, and photos.
14. Data Pipeline
- Overview:
- Processes and aggregates data for analytics and machine learning.
- Features:
- Handles real-time data ingestion and batch processing.
15. Load Balancer
- Overview:
- Distributes incoming traffic across servers to ensure high availability and performance.
- Features:
- Fault tolerance and traffic management during peak loads.
16. Monitoring and Logging Service
- Overview:
- Tracks system performance and logs user activities for debugging and insights.
- Features:
- Alerts for system health issues and detailed activity logs.
Request flows
1. User Registration
- User Interface:
- User inputs name, email, and password on the registration page.
- API Gateway:
- Routes the request to the User Management Service.
- User Management Service:
- Validates input and checks for duplicate email.
- Hashes the password and creates a new user entry in the User Database.
- Database Interaction:
- Inserts user details into the User Database.
- Response:
- Returns a success message to the user.
2. Business Search
- User Interface:
- User enters search terms and filters on the search bar.
- API Gateway:
- Routes the request to the Search Service.
- Search Service:
- Queries the Search Index Database for matching businesses.
- Applies filters like category, location, and rating.
- Database Interaction:
- Fetches relevant business data from the Business Database.
- Response:
- Sends the list of matching businesses back to the user.
3. Submit a Review
- User Interface:
- User submits a review for a specific business.
- API Gateway:
- Routes the request to the Review Management Service.
- Review Management Service:
- Validates the review content and user authentication.
- Saves the review in the Review Database.
- Database Interaction:
- Updates the Search Index with the new review.
- Updates aggregated ratings for the business in the Business Database.
- Notification Service:
- Sends a notification to the business owner about the new review.
- Response:
- Confirms the review submission to the user.
4. Upload a Photo
- User Interface:
- User selects and uploads a photo for a business.
- API Gateway:
- Routes the request to the Photo Management Service.
- Photo Management Service:
- Processes and stores the photo in Object Storage (e.g., AWS S3).
- Saves metadata (e.g., uploader_id, business_id, URL) in the Photo Database.
- Database Interaction:
- Links the photo to the corresponding business in the database.
- Response:
- Confirms the upload and displays the photo to the user.
5. Business Claim by Owner
- User Interface:
- Business owner submits a claim request for their business.
- API Gateway:
- Routes the request to the Business Management Service.
- Business Management Service:
- Validates the owner’s identity.
- Updates the ownership status in the Business Database.
- Notification Service:
- Notifies the admin team for verification, if required.
- Response:
- Confirms the claim request submission to the owner.
6. Notification Retrieval
- User Interface:
- User checks notifications via the app or website.
- API Gateway:
- Routes the request to the Notification Service.
- Notification Service:
- Fetches unread notifications from the Notification Database.
- Response:
- Displays notifications to the user.
7. Flagging Inappropriate Content
- User Interface:
- User flags a review or photo as inappropriate.
- API Gateway:
- Routes the request to the Moderation Service.
- Moderation Service:
- Saves the flagged content in the Moderation Queue Database.
- Notification:
- Notifies the admin team for review.
- Response:
- Confirms the flag submission to the user.
8. Analytics Retrieval
- User Interface:
- Business owner or admin views analytics.
- API Gateway:
- Routes the request to the Analytics Service.
- Analytics Service:
- Fetches data from the Analytics Database.
- Response:
- Displays detailed insights to the user.
9. Geolocation-Based Search
- User Interface:
- User opts to search nearby businesses.
- API Gateway:
- Routes the request to the Search Service.
- Search Service:
- Queries the Search Index for businesses near the user’s geolocation.
- Database Interaction:
- Retrieves business details from the Business Database.
- Response:
- Displays a map with nearby businesses.
Detailed component design
1. Search Service
1. End-to-End Working
- Accepts user queries via the API Gateway.
- Parses query parameters (keywords, filters like location or category).
- Queries the Search Index Database (Elasticsearch) for relevant matches using full-text and geospatial search.
- Applies filters (e.g., category, proximity) and ranks results based on relevance.
- Returns the results to the user.
2. Data Structures and Algorithms
- Inverted Index: Maps keywords to document IDs for efficient text search.
- R-Trees/Geohashing: For spatial indexing to handle proximity-based queries.
- TF-IDF or BM25: Scoring algorithms to rank search results by relevance.
- Trie Structure: For efficient autocomplete suggestions.
3. Handling Peak Traffic (Scaling)
- Sharded Index: Divides search data across nodes by region or category to distribute load.
- Caching: Frequently searched queries are cached in Redis for faster responses.
- Horizontal Scaling: Additional Elasticsearch nodes are deployed to handle increased query volume.
4. Edge Cases and Handling
- Case 1: Typo in Search Query:
- Handling: Implements fuzzy search with Levenshtein distance to suggest corrections.
- Case 2: No Results Found:
- Handling: Suggests popular businesses or categories as fallback.
- Case 3: High Query Volume:
- Handling: Rate limits users during extreme surges and prioritizes logged-in users.
2. Review Management Service
1. End-to-End Working
- Accepts review submissions via the API Gateway.
- Validates review content and user authentication.
- Stores reviews in the Review Database (MongoDB).
- Updates the Search Index with the review for improved business visibility.
- Sends notifications to business owners and relevant users.
2. Data Structures and Algorithms
- Document Store: MongoDB’s flexible schema for storing diverse review content.
- TTL Indexes: Automatically deletes flagged or temporary reviews after a set period.
- Aggregation Pipelines: Efficiently calculates business ratings and review counts.
3. Handling Peak Traffic (Scaling)
- Partitioned Collections: Shards reviews by business ID to distribute writes.
- Write Queues: Temporarily queues reviews for delayed processing during heavy loads.
- Horizontal Scaling: Adds MongoDB nodes to scale read and write capacity.
4. Edge Cases and Handling
- Case 1: Duplicate Reviews:
- Handling: Enforces uniqueness by combining user ID and business ID.
- Case 2: Profanity or Inappropriate Content:
- Handling: Filters reviews with text-analysis models and flags inappropriate ones.
- Case 3: Review Flooding:
- Handling: Rate limits review submissions to prevent spam.
3. Photo Management Service
1. End-to-End Working
- Processes photo uploads from users.
- Stores photos in Object Storage (AWS S3/Google Cloud Storage).
- Saves metadata (e.g., uploader ID, business ID) in the Photo Database.
- Links photos to the corresponding business profile.
2. Data Structures and Algorithms
- Object Metadata: Stored in relational format for efficient indexing and retrieval.
- Content Delivery Networks (CDN): Ensures fast photo delivery to end-users.
- Chunked Uploads: Splits large files into smaller parts for efficient uploads.
3. Handling Peak Traffic (Scaling)
- CDN Integration: Caches frequently accessed photos at the edge for low-latency delivery.
- Object Storage Partitioning: Distributes photos across multiple storage buckets.
- Upload Queueing: Handles burst traffic by queueing uploads during surges.
4. Edge Cases and Handling
- Case 1: Large File Uploads:
- Handling: Implements chunked uploads and retries for network failures.
- Case 2: Duplicate Photo Uploads:
- Handling: Uses hash-based deduplication to avoid redundant storage.
- Case 3: Photo Abuse (e.g., offensive content):
- Handling: Integrates AI-based moderation tools to flag inappropriate images.
4. Notification Service
1. End-to-End Working
- Receives events like new reviews or flagged content.
- Queues notifications in the Notification Queue.
- Delivers notifications to users via push notifications or email.
- Updates notification statuses in the Notification Database.
2. Data Structures and Algorithms
- Priority Queues: Ensures high-priority notifications (e.g., flagged content) are delivered first.
- Redis Pub/Sub: Real-time delivery of notifications.
- Event-Driven Architecture: Processes notifications asynchronously for scalability.
3. Handling Peak Traffic (Scaling)
- Message Queues: Uses RabbitMQ/Kafka to buffer notifications during spikes.
- Parallel Processing: Processes notification batches with worker nodes.
- Fallback Mechanisms: Switches to delayed email notifications during service outages.
4. Edge Cases and Handling
- Case 1: Undelivered Notifications:
- Handling: Implements retries with exponential backoff.
- Case 2: Notification Flooding:
- Handling: Groups multiple notifications into a single digest for the user.
- Case 3: Failed Delivery Channels:
- Handling: Switches between push, SMS, or email channels based on availability.
5. Business Management Service
1. End-to-End Working
- Handles CRUD operations for business listings.
- Allows business owners to claim listings after verification.
- Updates business information in the Business Database and Search Index.
2. Data Structures and Algorithms
- Relational Tables: Efficiently store structured data like business details.
- Verification Workflow: Processes claims with step-by-step verification checks.
3. Handling Peak Traffic (Scaling)
- Read Replicas: Serves high read traffic from replicas to offload the primary database.
- Async Index Updates: Processes search index updates in batches to reduce load.
- Horizontal Scaling: Scales business management microservices to handle concurrent updates.
4. Edge Cases and Handling
- Case 1: Duplicate Business Listings:
- Handling: Merges listings with matching metadata.
- Case 2: Fake Business Claims:
- Handling: Requires ownership proofs (e.g., utility bills or government-issued IDs).
- Case 3: Large Updates (e.g., bulk edits):
- Handling: Processes updates in smaller batches to avoid database overload.
Trade offs/Tech choices
MongoDB for Reviews:
- Trade-Off: Flexibility for diverse review structures but lacks strong consistency.
- Reason: Reviews are dynamic and vary in content; MongoDB's schema flexibility handles this efficiently.
Elasticsearch for Search:
- Trade-Off: High resource usage but provides fast and accurate full-text and geospatial search.
- Reason: Optimized for real-time search and filtering across large datasets.
AWS S3 for Photo Storage:
- Trade-Off: Higher latency compared to local storage but highly scalable and cost-effective for large volumes of photos.
- Reason: Ensures durability and easy integration with CDNs for global delivery.
PostgreSQL for Users:
- Trade-Off: More complex scaling compared to NoSQL, but provides strong consistency.
- Reason: Relational structure fits well with user authentication and profile management.
Event-Driven Notifications:
- Trade-Off: Adds complexity with message queues but decouples services for scalability and fault tolerance.
- Reason: Handles high notification volumes asynchronously for better performance.
Failure scenarios/bottlenecks
Search Index Overload:
- Issue: High query volume slows down Elasticsearch.
- Mitigation: Implement caching for frequent queries and shard the index.
Photo Storage Latency:
- Issue: Slow uploads or retrievals during traffic surges.
- Mitigation: Use CDNs and chunked uploads for scalability.
Database Overload:
- Issue: High read/write traffic on relational databases.
- Mitigation: Use read replicas, query optimization, and caching.
Notification Delays:
- Issue: Message queues back up during peaks.
- Mitigation: Scale workers and prioritize notifications.
Review Spam:
- Issue: Excessive or inappropriate reviews overwhelm the system.
- Mitigation: Use rate limiting and automated moderation tools.
Service Downtime:
- Issue: API Gateway or key services become unavailable.
- Mitigation: Use load balancers, failover servers, and health checks.
Flagged Content Abuse:
- Issue: Users maliciously flag valid reviews or photos.
- Mitigation: Implement user reputation scoring and manual moderation.
Scaling Bottlenecks:
- Issue: Inefficient horizontal scaling under sudden spikes.
- Mitigation: Pre-scale resources during predicted high traffic.
Future improvements
Enhanced Search Optimization:
- Improvement: Use ML-based ranking for personalized search results.
- Mitigation: Reduces search index bottlenecks with smarter query handling.
AI-Powered Moderation:
- Improvement: Deploy AI models for review and photo moderation.
- Mitigation: Minimizes flagged content abuse and reduces manual workload.
Real-Time Analytics:
- Improvement: Introduce real-time pipelines for instant business insights.
- Mitigation: Handles database overload by offloading analytics.
Auto-Scaling for Peak Traffic:
- Improvement: Implement predictive auto-scaling for all services.
- Mitigation: Prepares the system for sudden traffic spikes.
Enhanced Caching Strategy:
- Improvement: Expand caching for popular searches, reviews, and photos.
- Mitigation: Alleviates search index and database load during peaks.
Improved Notification Handling:
- Improvement: Group similar notifications and introduce digest alerts.
- Mitigation: Prevents queue backlogs and reduces delivery delays.
Offline Capabilities:
- Improvement: Allow users to download more robust offline data.
- Mitigation: Reduces reliance on real-time data during outages.