Codemia | Master System Design Interviews Through Active Practice

My Solution for Design a Notification System with Score: 8/10

by nectar4678

Requirements

To design a notification system, we must first understand how it will be used. The system should:

Managing user preferences for notification types and delivery channels.
Scheduling notifications across various time zones and ensuring they are sent at the right time.
Implementing robust retry mechanisms with strategies like exponential backoff and fallback channels.
Providing APIs with authentication for external systems to send notifications.
Logging errors, tracking delivery status, and processing delivery reports.
Supporting union and intersection of channels to handle multi-channel notifications.
Handling peak loads with caching, load balancing, and distributed processing.
Preventing duplicate notifications and addressing edge cases like time zones and third-party failures.

Define Core Objects

Based on the requirements, the main objects in the system include:

User: Represents recipients and stores their preferences, time zone, and notification history.
Notification: Contains the notification payload, delivery status, priority, and associated channels.
Channel: Encapsulates logic for communication mediums and fallback strategies.
Scheduler: Manages scheduling tasks, respecting time zones and priority.
RetryManager: Handles retry logic, limits, and fallback strategies.
Template: Stores and formats content dynamically based on user preferences and channels.
Logger: Captures and manages errors, delivery reports, and retries.
APIService: Provides secured endpoints for external services to interact with the system.

Analyze Relationships

User and Preferences: Preferences are stored in a database, fetched at runtime, and cached for performance.
Notification and Status: Each notification tracks its delivery status, updated asynchronously as delivery reports are processed.
Notification and Channels: Union and intersection logic determines which channels are used based on preferences.
RetryManager and Notification: Retried notifications are updated with attempt counts and final statuses.
Scheduler and Time Zones: Notifications are scheduled based on user time zones, considering daylight saving adjustments.

Establish Hierarchy

Channel Hierarchy: Create a base Channel class with common attributes and methods. Derive EmailChannel, SMSChannel, and PushChannel for channel-specific logic.
Notification Type Hierarchy: Define a base Notification class and extend it for different types like TransactionalNotification and PromotionalNotification.

Design Patterns

Channel Hierarchy: Base Channel class with specific implementations for EmailChannel, SMSChannel, PushChannel, etc.
Notification Type Hierarchy: Extendable base Notification class for different notification types.
Retry Strategies: Strategy pattern for retry logic, allowing dynamic injection of strategies (e.g., exponential backoff).

Define Class Members (write code)

class User:
    def __init__(self, user_id, preferences, time_zone):
        self.user_id = user_id
        self.preferences = preferences  # {channel: enabled/disabled}
        self.time_zone = time_zone

    def get_preferences(self):
        # Simulate fetching preferences from a database
        return self.preferences

class Notification:
    def __init__(self, user, content, channels, priority="Normal"):
        self.user = user
        self.content = content
        self.channels = channels
        self.priority = priority
        self.status = "Pending"
        self.attempts = 0

    def update_status(self, new_status):
        self.status = new_status

class Scheduler:
    def schedule(self, notification):
        # Convert user time zone to UTC for scheduling
        utc_time = self.convert_to_utc(notification.user.time_zone, notification.scheduled_time)
        # Add to a job queue (e.g., RabbitMQ)
        print(f"Scheduled notification at {utc_time} for user {notification.user.user_id}")

    def convert_to_utc(self, time_zone, time):
        # Example conversion logic
        return time  # Simplified for demonstration

class RetryManager:
    def retry(self, notification):
        if notification.attempts >= 3:
            notification.update_status("Failed")
            Logger.log_error(f"Notification failed after retries: {notification}")
            return
        notification.attempts += 1
        # Exponential backoff
        delay = 2 ** notification.attempts
        print(f"Retrying in {delay} seconds for user {notification.user.user_id}")

class Logger:
    @staticmethod
    def log_error(message):
        # Write to an error log
        print(f"ERROR: {message}")

    @staticmethod
    def log_event(message):
        # Write to an event log
        print(f"EVENT: {message}")

Adhere to SOLID Guidelines

Single Responsibility: Each class handles one responsibility, such as sending a notification or managing retries.
Open/Closed Principle: The system is open to new channels by extending the Channel class without modifying existing code.
Liskov Substitution: Derived channel classes can replace the base Channel class wherever required.
Interface Segregation: Classes like Channel provide specific methods (send) that do not impose unnecessary functionality.
Dependency Inversion: High-level modules like NotificationService depend on abstractions like Channel rather than concrete implementations.

Consider Scalability and Flexibility

The design can handle large-scale notifications by:

Caching Strategies: Use Redis to cache user preferences and notification templates.
Load Balancing: Distribute requests across servers and workers using load balancers and message queues.
Duplicate Handling: Employ deduplication logic at the database level using unique keys (e.g., user_id + content_hash).

Create/Explain your diagram(s)

Future improvements

User Interaction: Implement APIs for user preferences management, including UI/UX designs for opting in/out.
Advanced Scheduling: Support recurring notifications and dynamic scheduling changes.
Edge Case Handling:
- Daylight saving adjustments for time zones.
- Notification prioritization during third-party outages.
Monitoring and Alerts: Real-time dashboards to monitor delivery success rates and system health.