System requirements
Functional:
- User Registration:
- Users should be able to create an account by providing basic information such as name, email address, and password.
- The system should validate the uniqueness of email addresses to ensure each user has a unique identifier.
- Users should receive a confirmation email to verify their email address and activate their account.
- Account Management:
- Users should be able to log in securely using their email address and password.
- Once logged in, users should be able to view their account balance, transaction history, and personal information.
- Users should have the option to update their personal information such as email address, password, and contact details.
- Fund Transfer:
- Users should be able to transfer funds securely to other users or external accounts.
- The system should support different transfer methods such as bank transfers, credit/debit card transfers, and peer-to-peer transfers.
- Users should have the option to set up recurring transfers or schedule future transfers.
- Payment Processing:
- Users should be able to make payments to merchants for goods and services.
- The system should support various payment methods including credit/debit cards, bank transfers, and digital wallets.
- Payments should be processed securely using encryption and tokenization to protect sensitive information.
- Fraud Detection:
- Implement algorithms to detect and prevent fraudulent transactions in real-time.
- The system should analyze transaction patterns, user behavior, and other relevant data to identify suspicious activities.
- Users should be notified promptly if any potentially fraudulent activity is detected on their account.
- Buyer and Seller Protection:
- Provide mechanisms to resolve disputes between buyers and sellers, ensuring fair and secure transactions.
- Implement policies and procedures for handling chargebacks, refunds, and disputes in accordance with industry standards.
- Multi-currency Support:
- Support transactions in multiple currencies to accommodate international payments.
- Users should be able to view prices and perform transactions in their preferred currency.
- The system should use up-to-date exchange rates for currency conversion and display.
Non-Functional:
- Security:
- The system should comply with industry standards for data security and privacy, including PCI DSS and GDPR.
- All sensitive data should be encrypted during transmission and storage.
- Implement strong authentication measures such as two-factor authentication to prevent unauthorized access.
- Performance:
- The system should be highly available, with minimal downtime for maintenance and upgrades.
- Transactions should be processed promptly, with low latency to provide a seamless user experience.
- The system should be able to handle a large number of concurrent users and transactions without performance degradation.
- Scalability:
- The system should be scalable to accommodate growing user base and transaction volume.
- Infrastructure should be designed to scale horizontally and vertically based on demand.
- Reliability:
- Ensure high reliability of the system with robust failover and disaster recovery mechanisms.
- Implement regular backups of data to prevent data loss in case of system failures.
- Usability:
- The user interface should be intuitive and easy to use, catering to users of all technical levels.
- Provide clear and concise instructions for performing various actions such as fund transfers and payments.
- Support multiple languages and accessibility features to ensure inclusivity.
- Compliance:
- Ensure compliance with regulatory requirements in all jurisdictions where the service operates.
- Keep abreast of changes in regulations related to online payments and update the system accordingly.
- Monitoring and Logging:
- Implement comprehensive logging of all system activities and transactions for audit and troubleshooting purposes.
- Set up monitoring tools to track system performance, security incidents, and potential issues in real-time.
Capacity estimation
Transactions per Second (TPS):
Let us assume we have to handle 10 Million transactions per day. To calculate the transactions per second, we'll divide the total number of transactions per day by the number of seconds in a day.
Total transactions per day = 10,000,000
Number of seconds in a day = 86,400
Transactions per second = Total transactions per day / Number of seconds in a day
Transactions per second = 10,000,000 / 86,400
Transactions per second ≈ 115.74 transactions per second
So, the system needs to process approximately 115.74 transactions per second.
Number of Servers:
Given that 1 server can handle 1000 concurrent requests, we'll need to calculate the number of servers required to handle the total load.
However, for high availability scenarios, we need to consider redundancy and failover. Let's assume a redundancy factor of 2 for high availability.
We can roughly estimate 200 servers will be required to support our scale.
Storage Requirements:
If each transaction contains 100 KB of data, we can calculate the total data generated per year and then estimate the storage required for 5 years.
Data per transaction = 100 KB
Total transactions per year = 10,000,000 * 365 ≈ 3,650,000,000
Total data generated per year = Data per transaction * Total transactions per year
Total data generated per year = 100 KB * 3,650,000,000 ≈ 365,000,000,000 KB
Total data generated for 5 years = Total data generated per year * 5
Total data generated for 5 years ≈ 365,000,000,000 KB * 5
Total data generated for 5 years ≈ 1,825,000,000,000 KB
Total data generated for 5 years ≈ 1.825 petabytes (PB)
So, approximately 1.825 petabytes of storage will be required for 5 years.
Performance Requirements:
- Response time: Ensure that transactions are processed within an acceptable response time (e.g., milliseconds).
- Throughput: Maintain a high throughput to handle the expected transaction volume.
- Scalability: Ensure the system can scale horizontally and vertically to accommodate increased load.
- Availability: Maintain high availability to ensure the system is accessible to users at all times.
- Reliability: Minimize downtime and ensure the system can recover from failures quickly.
- Resource utilization: Optimize resource utilization (CPU, memory, network) to maximize efficiency and minimize costs.
- Error rates: Keep error rates low to ensure accurate and reliable transaction processing.
API design
For designing payment systems, multiple APIs will be required, below is a list of few essential APIs.
- User Authentication API:
- This API is essential for handling user authentication during the registration and login processes.
- It would provide endpoints for user registration, login, logout, and password reset functionalities.
- Technologies such as OAuth 2.0 or JWT (JSON Web Tokens) could be used for authentication.
- Payment Processing APIs:
- Payment Gateway API: This API facilitates communication between the payment service and financial institutions (banks, credit card networks) to authorize and process transactions.
- Payment Method APIs: APIs for handling different payment methods such as credit/debit cards, bank transfers, digital wallets (e.g., Apple Pay, Google Pay), and cryptocurrency payments.
- Subscription Billing API: If the service offers subscription-based billing, an API for managing subscription plans, recurring payments, and subscription lifecycle events would be necessary.
- Account Management APIs:
- User Account API: This API allows users to view their account balance, transaction history, and manage personal information.
- Funds Transfer API: Enables users to transfer funds between their accounts, to other users, or external accounts securely.
- Currency Exchange API: If multi-currency support is required, an API for currency conversion would be necessary.
- Notification APIs:
- Email/SMS Notification API: Allows sending notifications to users about transaction statuses, account updates, security alerts, and other important events.
- Push Notification API: For sending real-time notifications to users' mobile devices, enhancing user engagement and providing timely updates.
- Integration APIs:
- Merchant Integration API: For merchants to integrate their websites or applications with the payment service, enabling them to accept payments.
- Third-Party Service Integration APIs: APIs for integrating with third-party services such as e-commerce platforms, invoicing systems, and accounting software to streamline payment processing and data synchronization.
- Compliance APIs:
- Compliance Check API: Integrates with regulatory compliance services to verify user identities, perform Know Your Customer (KYC) checks, and ensure compliance with anti-money laundering (AML) regulations.
- Support APIs:
- Customer Support API: Provides support ticket management, chat support integration, and access to knowledge base articles for assisting users with inquiries and issues.
Database design
For the tables required in this design, refer to the class diagram, the list of classes is not exhaustive but this is a good number of tables to start with.
Database Choices
- User Information and Transactional Data:
- Database Type: SQL (e.g., PostgreSQL)
- Reasoning: SQL databases offer ACID transactions and strong consistency, making them suitable for storing critical data such as user information, account balances, and transaction records.
- CAP Theorem Focus: Consistency Focused. SQL databases prioritize strong consistency over availability and partition tolerance.
- Session Management and Caching:
- Database Type: Key-Value Store (e.g., Redis)
- Reasoning: Key-value stores are optimized for high-speed access and low latency, making them ideal for caching frequently accessed data such as user sessions, authentication tokens, and temporary data.
- CAP Theorem Focus: Availability Focused. Key-value stores prioritize high availability and partition tolerance over strong consistency.
- Notification and Preference Data:
- Database Type: Document Store (e.g., MongoDB)
- Reasoning: Document stores provide flexible schema and horizontal scalability, making them suitable for storing semi-structured data such as notifications, user preferences, and transaction details.
- CAP Theorem Focus: Balanced. Document stores aim to achieve a balance between availability, consistency, and partition tolerance.
- Transaction Logs and Audit Trails:
- Database Type: Wide-Column Store (e.g., Cassandra)
- Reasoning: Wide-column stores are designed for time-series data and offer linear scalability and fault tolerance, making them suitable for storing transaction logs, fraud detection records, and audit trails.
- CAP Theorem Focus: Partition Tolerance Focused. Wide-column stores prioritize partition tolerance to ensure fault tolerance and scalability in distributed systems.
- Historical Data and Bookkeeping:
- Database Type: Data Warehouse (e.g., Amazon Redshift, Google BigQuery)
- Reasoning: Data warehouses are optimized for storing and analyzing large volumes of historical data efficiently, making them suitable for bookkeeping, analytics, and reporting purposes.
- CAP Theorem Focus: Balanced. Data warehouses prioritize consistency and availability for analytical queries, often sacrificing real-time updates and transactional capabilities in favor of scalability and performance.
Data Partitioning
For the given problem of designing an online payment system, the best partitioning strategy would likely involve a combination of regional or geographical partitioning and functional partitioning.
- Regional or Geographical Partitioning:
- Since online payment systems often need to comply with regional regulations and cater to users in different geographic locations, partitioning data based on regions can help optimize data access and ensure compliance.
- Tables such as User, Merchant, and Transaction could be partitioned based on the geographic regions they belong to, ensuring data locality and reducing latency for users in specific areas.
- Functional Partitioning:
- Functional partitioning involves dividing data based on the functionality or usage patterns of the application.
- For example, separating transactional data (e.g., transactions, payment methods) from non-transactional data (e.g., user preferences, notifications) can help optimize data access and scalability.
- This approach allows for better resource allocation and optimization of database performance based on the specific needs of different parts of the system.
Partitioning Algorithm:
When it comes to implementing partitioning for the system, a common algorithm used is consistent hashing.
- Consistent hashing ensures a uniform distribution of data across partitions while minimizing the need for data migration when the number of partitions changes.
- It provides a balanced approach to distributing data across partitions, ensuring efficient data access and scalability as the system grows.
Scaling Strategy:
Horizontal scaling would be the best strategy for scaling the databases in this scenario. It allows for adding more servers to the existing infrastructure, enabling better performance and higher availability without significant changes to the application architecture. With the growing user base and transaction volume, horizontal scaling ensures seamless expansion by distributing the workload across multiple nodes.
Read/Write Separation:
Implementing Read/Write Separation could be beneficial to optimize the system's performance and scalability. By directing read operations to read replicas or caches, the system can handle a larger volume of read requests without impacting write operations. This separation improves overall system responsiveness and user experience, especially during peak usage periods.
High-level design
- User Interface (UI):
- Frontend application for users to interact with the system, including features for account management, fund transfers, payment processing, and notifications.
- Authentication and Authorization Service:
- Service responsible for user authentication, session management, and access control, ensuring secure access to the system's functionalities.
- User Management Service:
- Service for user registration, profile management, and account-related functionalities, such as updating personal information and managing payment methods.
- Payment Processing Service:
- Core service for processing payments, handling transactions between users and merchants, integrating with payment gateways, and ensuring secure payment processing.
- Fraud Detection Service:
- Service for detecting and preventing fraudulent activities, implementing algorithms for real-time fraud detection, monitoring transaction patterns, and triggering alerts for suspicious activities.
- Notification Service:
- Service for sending notifications to users, including transaction updates, account alerts, and promotional messages, using various channels such as email, SMS, and push notifications.
- Analytics and Reporting Service:
- Service for generating reports, analyzing transaction data, and providing insights into user behavior, transaction trends, and business performance, supporting decision-making and strategic planning.
- Integration Services:
- Services for integrating with external systems and third-party APIs, including merchant platforms, payment gateways, accounting software, and regulatory compliance services.
- Data Storage and Persistence Layer:
- Database infrastructure for storing various types of data, including user information, transaction records, payment methods, notifications, and system logs, employing a combination of SQL and NoSQL databases tailored to specific data needs.
- Caching Layer:
- Caching infrastructure for improving performance and reducing latency, caching frequently accessed data such as user sessions, authentication tokens, and frequently accessed transaction data.
- Load Balancer and Scalability Components:
- Load balancers and auto-scaling mechanisms for distributing incoming traffic across multiple servers and scaling the system horizontally based on demand, ensuring high availability and optimal performance.
- Monitoring and Logging Infrastructure:
- Infrastructure for monitoring system health, performance metrics, and logging events, enabling proactive monitoring, troubleshooting, and performance optimization.
Request flows
Below is a simple sequence diagram for Payment flow.
Detailed component design
Before jumping into the detailed design, let's first understand a few components and their roles.
- Payment Service:
- Accepts payment events from users and coordinates the payment process.
- Performs risk checks using a third-party provider to ensure compliance with regulations and detects criminal activity.
- Stores payment events in the database and manages the flow of payment orders.
- Payment Executor:
- Executes individual payment orders via a Payment Service Provider (PSP).
- Stores payment orders in the database and interacts with external PSPs to process credit card payments.
- Payment Service Provider (PSP):
- Moves money from the buyer's account to the seller's account.
- Handles the actual transfer of funds and interacts with card schemes to process credit card operations.
- Card Schemes:
- Organizations like Visa, MasterCard, etc., that process credit card transactions.
- Ledger:
- Maintains a financial record of payment transactions, tracking debits and credits.
- Crucial for post-payment analysis, revenue calculation, and forecasting.
- Wallet:
- Manages the account balance of merchants, recording payments received.
- Updates balance information after successful payment processing.
- Hosted payment page
- The PSP provides a hosted payment page that captures the customer card information directly, rather than relying on our payment service.
Now let's discuss the detailed component design
Payment Service:
- Responsibilities: Orchestrates the payment process, receives payment events from users, performs risk checks, coordinates payment execution, and updates relevant systems.
- Key Features:
- Risk Assessment: Utilizes third-party services to conduct risk assessments, ensuring compliance with regulations and detecting fraudulent activities.
- Transaction Handling: Manages the lifecycle of payment transactions, including authorization, settlement, and reconciliation.
- Integration: Integrates with PSPs, external APIs, and internal services to facilitate payment processing and data exchange.
- Event Logging: Logs payment events, errors, and system activities for auditing and troubleshooting purposes.
- Technologies: Microservices architecture, RESTful APIs, asynchronous messaging (e.g., Kafka), database storage (e.g., PostgreSQL), and containerization (e.g., Docker).
Payment Executor:
- Responsibilities: Executes individual payment orders on behalf of the Payment Service, interacts with PSPs to process payments, and updates payment statuses.
- Key Features:
- Payment Order Management: Stores and manages payment orders, including details such as amount, currency, payment method, and transaction status.
- PSP Integration: Communicates with PSPs via secure APIs to authorize and process payments, handle redirects for 3D Secure authentication, and retrieve transaction statuses.
- Error Handling: Implements robust error handling mechanisms to handle communication failures, transaction errors, and retries.
- Transaction Logging: Logs transaction details, responses from PSPs, and error messages for auditing and reconciliation.
- Technologies: Microservices architecture, distributed transactions, secure API communication (e.g., OAuth), payment gateway integration (e.g., Stripe, PayPal), and logging frameworks (e.g., ELK stack).
Double-Entry Ledger System
The double-entry system states that the sum of all the transaction entries must be 0. One cent lost means someone else gains a cent. It provides end-to-end traceability and ensures consistency throughout the payment cycle. Double-entry system is fundamental to any payment system and is key to accurate bookkeeping. It records every payment transaction into two separate ledger accounts with the same amount.
Overview:
- Foundational accounting method ensuring accuracy in financial records.
- Each transaction involves at least two accounts: a debit and a credit.
- Maintains balance between debits and credits, adhering to the accounting equation: assets = liabilities + equity.
Application in Payment Systems:
- Used to record transactions involving movement of funds between accounts.
- Each payment transaction results in debits to one account (e.g., payer's account) and credits to another account (e.g., payee's account).
- Ensures accuracy and completeness of financial records by maintaining balance between debits and credits.
Reconciliation
In an asynchronous communication environment, where messages may not be delivered or responses returned, ensuring correctness becomes crucial. Reconciliation serves as a practice to periodically compare states among related services, verifying their agreement, and acting as the last line of defense in payment systems. Every night the PSP or banks send a settlement file to their clients. The settlement file contains the balance of the bank account, together with all the transactions that took place on this bank account during the day. The reconciliation system parses the settlement file and compares the details with the ledger system. Reconciliation is also used to verify that the payment system is internally consistent. For example, the states in the ledger and wallet might diverge and we could use the reconciliation system to detect any discrepancy.
Importance of Reconciliation:
- Accuracy Assurance: Validates that transaction records match across different systems, ensuring accuracy in financial data.
- Error Detection: Identifies discrepancies or errors in transaction records, allowing prompt resolution and preventing financial discrepancies.
- Compliance Adherence: Ensures compliance with regulatory requirements and industry standards by verifying the accuracy of financial records.
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?