Design an Online Payment Service with Score: 9/10
by alchemy1135
System requirements
Functional:
- User Registration:
- Users should be able to create an account by providing basic information such as name, email address, and password.
- The system should validate the uniqueness of email addresses to ensure each user has a unique identifier.
- Users should receive a confirmation email to verify their email address and activate their account.
- Account Management:
- Users should be able to log in securely using their email address and password.
- Once logged in, users should be able to view their account balance, transaction history, and personal information.
- Users should have the option to update their personal information such as email address, password, and contact details.
- Fund Transfer:
- Users should be able to transfer funds securely to other users or external accounts.
- The system should support different transfer methods such as bank transfers, credit/debit card transfers, and peer-to-peer transfers.
- Users should have the option to set up recurring transfers or schedule future transfers.
- Payment Processing:
- Users should be able to make payments to merchants for goods and services.
- The system should support various payment methods including credit/debit cards, bank transfers, and digital wallets.
- Payments should be processed securely using encryption and tokenization to protect sensitive information.
- Fraud Detection:
- Implement algorithms to detect and prevent fraudulent transactions in real-time.
- The system should analyze transaction patterns, user behavior, and other relevant data to identify suspicious activities.
- Users should be notified promptly if any potentially fraudulent activity is detected on their account.
- Buyer and Seller Protection:
- Provide mechanisms to resolve disputes between buyers and sellers, ensuring fair and secure transactions.
- Implement policies and procedures for handling chargebacks, refunds, and disputes in accordance with industry standards.
- Multi-currency Support:
- Support transactions in multiple currencies to accommodate international payments.
- Users should be able to view prices and perform transactions in their preferred currency.
- The system should use up-to-date exchange rates for currency conversion and display.
Non-Functional:
- Security:
- The system should comply with industry standards for data security and privacy, including PCI DSS and GDPR.
- All sensitive data should be encrypted during transmission and storage.
- Implement strong authentication measures such as two-factor authentication to prevent unauthorized access.
- Performance:
- The system should be highly available, with minimal downtime for maintenance and upgrades.
- Transactions should be processed promptly, with low latency to provide a seamless user experience.
- The system should be able to handle a large number of concurrent users and transactions without performance degradation.
- Scalability:
- The system should be scalable to accommodate growing user base and transaction volume.
- Infrastructure should be designed to scale horizontally and vertically based on demand.
- Reliability:
- Ensure high reliability of the system with robust failover and disaster recovery mechanisms.
- Implement regular backups of data to prevent data loss in case of system failures.
- Usability:
- The user interface should be intuitive and easy to use, catering to users of all technical levels.
- Provide clear and concise instructions for performing various actions such as fund transfers and payments.
- Support multiple languages and accessibility features to ensure inclusivity.
- Compliance:
- Ensure compliance with regulatory requirements in all jurisdictions where the service operates.
- Keep abreast of changes in regulations related to online payments and update the system accordingly.
- Monitoring and Logging:
- Implement comprehensive logging of all system activities and transactions for audit and troubleshooting purposes.
- Set up monitoring tools to track system performance, security incidents, and potential issues in real-time.
Capacity estimation
Transactions per Second (TPS):
Let us assume we have to handle 10 Million transactions per day. To calculate the transactions per second, we'll divide the total number of transactions per day by the number of seconds in a day.
Total transactions per day = 10,000,000
Number of seconds in a day = 86,400
Transactions per second = Total transactions per day / Number of seconds in a day
Transactions per second = 10,000,000 / 86,400
Transactions per second ≈ 115.74 transactions per second
So, the system needs to process approximately 115.74 transactions per second.
Number of Servers:
Given that 1 server can handle 1000 concurrent requests, we'll need to calculate the number of servers required to handle the total load.
However, for high availability scenarios, we need to consider redundancy and failover. Let's assume a redundancy factor of 2 for high availability.
We can roughly estimate 200 servers will be required to support our scale.
Storage Requirements:
If each transaction contains 100 KB of data, we can calculate the total data generated per year and then estimate the storage required for 5 years.
Data per transaction = 100 KB
Total transactions per year = 10,000,000 * 365 ≈ 3,650,000,000
Total data generated per year = Data per transaction * Total transactions per year
Total data generated per year = 100 KB * 3,650,000,000 ≈ 365,000,000,000 KB
Total data generated for 5 years = Total data generated per year * 5
Total data generated for 5 years ≈ 365,000,000,000 KB * 5
Total data generated for 5 years ≈ 1,825,000,000,000 KB
Total data generated for 5 years ≈ 1.825 petabytes (PB)
So, approximately 1.825 petabytes of storage will be required for 5 years.
Performance Requirements:
- Response time: Ensure that transactions are processed within an acceptable response time (e.g., milliseconds).
- Throughput: Maintain a high throughput to handle the expected transaction volume.
- Scalability: Ensure the system can scale horizontally and vertically to accommodate increased load.
- Availability: Maintain high availability to ensure the system is accessible to users at all times.
- Reliability: Minimize downtime and ensure the system can recover from failures quickly.
- Resource utilization: Optimize resource utilization (CPU, memory, network) to maximize efficiency and minimize costs.
- Error rates: Keep error rates low to ensure accurate and reliable transaction processing.
API design
For designing payment systems, multiple APIs will be required, below is a list of few essential APIs.
- User Authentication API:
- This API is essential for handling user authentication during the registration and login processes.
- It would provide endpoints for user registration, login, logout, and password reset functionalities.
- Technologies such as OAuth 2.0 or JWT (JSON Web Tokens) could be used for authentication.
- Payment Processing APIs:
- Payment Gateway API: This API facilitates communication between the payment service and financial institutions (banks, credit card networks) to authorize and process transactions.
- Payment Method APIs: APIs for handling different payment methods such as credit/debit cards, bank transfers, digital wallets (e.g., Apple Pay, Google Pay), and cryptocurrency payments.
- Subscription Billing API: If the service offers subscription-based billing, an API for managing subscription plans, recurring payments, and subscription lifecycle events would be necessary.
- Account Management APIs:
- User Account API: This API allows users to view their account balance, transaction history, and manage personal information.
- Funds Transfer API: Enables users to transfer funds between their accounts, to other users, or external accounts securely.
- Currency Exchange API: If multi-currency support is required, an API for currency conversion would be necessary.
- Notification APIs:
- Email/SMS Notification API: Allows sending notifications to users about transaction statuses, account updates, security alerts, and other important events.
- Push Notification API: For sending real-time notifications to users' mobile devices, enhancing user engagement and providing timely updates.
- Integration APIs:
- Merchant Integration API: For merchants to integrate their websites or applications with the payment service, enabling them to accept payments.
- Third-Party Service Integration APIs: APIs for integrating with third-party services such as e-commerce platforms, invoicing systems, and accounting software to streamline payment processing and data synchronization.
- Compliance APIs:
- Compliance Check API: Integrates with regulatory compliance services to verify user identities, perform Know Your Customer (KYC) checks, and ensure compliance with anti-money laundering (AML) regulations.
- Support APIs:
- Customer Support API: Provides support ticket management, chat support integration, and access to knowledge base articles for assisting users with inquiries and issues.
Database design
For the tables required in this design, refer to the class diagram, the list of classes is not exhaustive but this is a good number of tables to start with.
Database Choices
- User Information and Transactional Data:
- Database Type: SQL (e.g., PostgreSQL)
- Reasoning: SQL databases offer ACID transactions and strong consistency, making them suitable for storing critical data such as user information, account balances, and transaction records.
- CAP Theorem Focus: Consistency Focused. SQL databases prioritize strong consistency over availability and partition tolerance.
- Session Management and Caching:
- Database Type: Key-Value Store (e.g., Redis)
- Reasoning: Key-value stores are optimized for high-speed access and low latency, making them ideal for caching frequently accessed data such as user sessions, authentication tokens, and temporary data.
- CAP Theorem Focus: Availability Focused. Key-value stores prioritize high availability and partition tolerance over strong consistency.
- Notification and Preference Data:
- Database Type: Document Store (e.g., MongoDB)
- Reasoning: Document stores provide flexible schema and horizontal scalability, making them suitable for storing semi-structured data such as notifications, user preferences, and transaction details.
- CAP Theorem Focus: Balanced. Document stores aim to achieve a balance between availability, consistency, and partition tolerance.
- Transaction Logs and Audit Trails:
- Database Type: Wide-Column Store (e.g., Cassandra)
- Reasoning: Wide-column stores are designed for time-series data and offer linear scalability and fault tolerance, making them suitable for storing transaction logs, fraud detection records, and audit trails.
- CAP Theorem Focus: Partition Tolerance Focused. Wide-column stores prioritize partition tolerance to ensure fault tolerance and scalability in distributed systems.
- Historical Data and Bookkeeping:
- Database Type: Data Warehouse (e.g., Amazon Redshift, Google BigQuery)
- Reasoning: Data warehouses are optimized for storing and analyzing large volumes of historical data efficiently, making them suitable for bookkeeping, analytics, and reporting purposes.
- CAP Theorem Focus: Balanced. Data warehouses prioritize consistency and availability for analytical queries, often sacrificing real-time updates and transactional capabilities in favor of scalability and performance.
Data Partitioning
For the given problem of designing an online payment system, the best partitioning strategy would likely involve a combination of regional or geographical partitioning and functional partitioning.
- Regional or Geographical Partitioning:
- Since online payment systems often need to comply with regional regulations and cater to users in different geographic locations, partitioning data based on regions can help optimize data access and ensure compliance.
- Tables such as User, Merchant, and Transaction could be partitioned based on the geographic regions they belong to, ensuring data locality and reducing latency for users in specific areas.
- Functional Partitioning:
- Functional partitioning involves dividing data based on the functionality or usage patterns of the application.
- For example, separating transactional data (e.g., transactions, payment methods) from non-transactional data (e.g., user preferences, notifications) can help optimize data access and scalability.
- This approach allows for better resource allocation and optimization of database performance based on the specific needs of different parts of the system.
Partitioning Algorithm:
When it comes to implementing partitioning for the system, a common algorithm used is consistent hashing.
- Consistent hashing ensures a uniform distribution of data across partitions while minimizing the need for data migration when the number of partitions changes.
- It provides a balanced approach to distributing data across partitions, ensuring efficient data access and scalability as the system grows.
Scaling Strategy:
Horizontal scaling would be the best strategy for scaling the databases in this scenario. It allows for adding more servers to the existing infrastructure, enabling better performance and higher availability without significant changes to the application architecture. With the growing user base and transaction volume, horizontal scaling ensures seamless expansion by distributing the workload across multiple nodes.
Read/Write Separation:
Implementing Read/Write Separation could be beneficial to optimize the system's performance and scalability. By directing read operations to read replicas or caches, the system can handle a larger volume of read requests without impacting write operations. This separation improves overall system responsiveness and user experience, especially during peak usage periods.
High-level design
- User Interface (UI):
- Frontend application for users to interact with the system, including features for account management, fund transfers, payment processing, and notifications.
- Authentication and Authorization Service:
- Service responsible for user authentication, session management, and access control, ensuring secure access to the system's functionalities.
- User Management Service:
- Service for user registration, profile management, and account-related functionalities, such as updating personal information and managing payment methods.
- Payment Processing Service:
- Core service for processing payments, handling transactions between users and merchants, integrating with payment gateways, and ensuring secure payment processing.
- Fraud Detection Service:
- Service for detecting and preventing fraudulent activities, implementing algorithms for real-time fraud detection, monitoring transaction patterns, and triggering alerts for suspicious activities.
- Notification Service:
- Service for sending notifications to users, including transaction updates, account alerts, and promotional messages, using various channels such as email, SMS, and push notifications.
- Analytics and Reporting Service:
- Service for generating reports, analyzing transaction data, and providing insights into user behavior, transaction trends, and business performance, supporting decision-making and strategic planning.
- Integration Services:
- Services for integrating with external systems and third-party APIs, including merchant platforms, payment gateways, accounting software, and regulatory compliance services.
- Data Storage and Persistence Layer:
- Database infrastructure for storing various types of data, including user information, transaction records, payment methods, notifications, and system logs, employing a combination of SQL and NoSQL databases tailored to specific data needs.
- Caching Layer:
- Caching infrastructure for improving performance and reducing latency, caching frequently accessed data such as user sessions, authentication tokens, and frequently accessed transaction data.
- Load Balancer and Scalability Components:
- Load balancers and auto-scaling mechanisms for distributing incoming traffic across multiple servers and scaling the system horizontally based on demand, ensuring high availability and optimal performance.
- Monitoring and Logging Infrastructure:
- Infrastructure for monitoring system health, performance metrics, and logging events, enabling proactive monitoring, troubleshooting, and performance optimization.
Request flows
Below is a simple sequence diagram for Payment flow.
Detailed component design
Before jumping into the detailed design, let's first understand a few components and their roles.
- Payment Service:
- Accepts payment events from users and coordinates the payment process.
- Performs risk checks using a third-party provider to ensure compliance with regulations and detects criminal activity.
- Stores payment events in the database and manages the flow of payment orders.
- Payment Executor:
- Executes individual payment orders via a Payment Service Provider (PSP).
- Stores payment orders in the database and interacts with external PSPs to process credit card payments.
- Payment Service Provider (PSP):
- Moves money from the buyer's account to the seller's account.
- Handles the actual transfer of funds and interacts with card schemes to process credit card operations.
- Card Schemes:
- Organizations like Visa, MasterCard, etc., that process credit card transactions.
- Ledger:
- Maintains a financial record of payment transactions, tracking debits and credits.
- Crucial for post-payment analysis, revenue calculation, and forecasting.
- Wallet:
- Manages the account balance of merchants, recording payments received.
- Updates balance information after successful payment processing.
- Hosted payment page
- The PSP provides a hosted payment page that captures the customer card information directly, rather than relying on our payment service.
Lets have a look at a typical flow
Step 1: User Purchase
- User Interaction: The user purchases goods and enters card details on the PCI DSS certified checkout page provided by the merchant.
Step 2: Payment Gateway (PG)
- Capture Card Details: All card details provided by the customer are captured in XML/JSON format.
- Transmission to Payment Gateway: The captured card details are sent from the Merchant UI to the Payment Gateway (e.g., Razorpay, Stripe) for transaction processing.
Step 3: Payment Processor (PP)
- Message Format Conversion: The Payment Processor converts the received XML/JSON message into the ISO Std 8583 message format, which is understood by the ETF switch.
Step 4: Card Association (CA)
- Transfer to Associative Banks: Card Association transfers the card details to the associated banks based on the first 4 digits of the card number, mapping them to respective banks (e.g., Mastercard, Visa).
- Validation Checks: The associated Issuer Bank validates the card details, expiry, and customer balance.
Step 5: Issuer Bank (IB)
- Validation Checks: The Issuer Bank validates the card details by:Verifying card authenticity.
- Checking card expiry.
- Verifying customer balance.
Step 6: Transaction Authorization
- Accept/Reject Transaction: If all validation checks pass, the Issuer Bank accepts the transaction; otherwise, it rejects it.
- Transaction Response: The Issuer Bank sends a message back to the Card Association indicating whether the transaction was accepted or rejected.
Step 7: Transaction Outcome
- Transaction Outcome: Based on the response from the Issuer Bank, the transaction is marked as completed or failed.
- Feedback to Merchant: The Payment Processor communicates the transaction outcome (successful or failed) back to the Merchant UI for user feedback.
Now let's discuss the detailed component design
Payment Service:
- Responsibilities: Orchestrates the payment process, receives payment events from users, performs risk checks, coordinates payment execution, and updates relevant systems.
- Key Features:
- Risk Assessment: Utilizes third-party services to conduct risk assessments, ensuring compliance with regulations and detecting fraudulent activities.
- Transaction Handling: Manages the lifecycle of payment transactions, including authorization, settlement, and reconciliation.
- Integration: Integrates with PSPs, external APIs, and internal services to facilitate payment processing and data exchange.
- Event Logging: Logs payment events, errors, and system activities for auditing and troubleshooting purposes.
- Technologies: Microservices architecture, RESTful APIs, asynchronous messaging (e.g., Kafka), database storage (e.g., PostgreSQL), and containerization (e.g., Docker).
Payment Executor:
- Responsibilities: Executes individual payment orders on behalf of the Payment Service, interacts with PSPs to process payments, and updates payment statuses.
- Key Features:
- Payment Order Management: Stores and manages payment orders, including details such as amount, currency, payment method, and transaction status.
- PSP Integration: Communicates with PSPs via secure APIs to authorize and process payments, handle redirects for 3D Secure authentication, and retrieve transaction statuses.
- Error Handling: Implements robust error handling mechanisms to handle communication failures, transaction errors, and retries.
- Transaction Logging: Logs transaction details, responses from PSPs, and error messages for auditing and reconciliation.
- Technologies: Microservices architecture, distributed transactions, secure API communication (e.g., OAuth), payment gateway integration (e.g., Stripe, PayPal), and logging frameworks (e.g., ELK stack).
Double-Entry Ledger System
The double-entry system states that the sum of all the transaction entries must be 0. One cent lost means someone else gains a cent. It provides end-to-end traceability and ensures consistency throughout the payment cycle. Double-entry system is fundamental to any payment system and is key to accurate bookkeeping. It records every payment transaction into two separate ledger accounts with the same amount.
Overview:
- Foundational accounting method ensuring accuracy in financial records.
- Each transaction involves at least two accounts: a debit and a credit.
- Maintains balance between debits and credits, adhering to the accounting equation: assets = liabilities + equity.
Application in Payment Systems:
- Used to record transactions involving movement of funds between accounts.
- Each payment transaction results in debits to one account (e.g., payer's account) and credits to another account (e.g., payee's account).
- Ensures accuracy and completeness of financial records by maintaining balance between debits and credits.
Reconciliation
In an asynchronous communication environment, where messages may not be delivered or responses returned, ensuring correctness becomes crucial. Reconciliation serves as a practice to periodically compare states among related services, verifying their agreement, and acting as the last line of defense in payment systems. Every night the PSP or banks send a settlement file to their clients. The settlement file contains the balance of the bank account, together with all the transactions that took place on this bank account during the day. The reconciliation system parses the settlement file and compares the details with the ledger system. Reconciliation is also used to verify that the payment system is internally consistent. For example, the states in the ledger and wallet might diverge and we could use the reconciliation system to detect any discrepancy.
Importance of Reconciliation:
- Accuracy Assurance: Validates that transaction records match across different systems, ensuring accuracy in financial data.
- Error Detection: Identifies discrepancies or errors in transaction records, allowing prompt resolution and preventing financial discrepancies.
- Compliance Adherence: Ensures compliance with regulatory requirements and industry standards by verifying the accuracy of financial records.
Handling payment processing delays
An end-to-end payment request flows through many components and involves both internal and external parties. While in most cases a payment request would complete in seconds, there are situations where a payment request would stall and sometimes take hours or days before it is completed or rejected. Here are some examples where a payment request could take longer than usual:
- The PSP deems a payment request high risk and requires a human to review it.
- A credit card requires extra protection like 3D Secure Authentication [13] which requires extra details from a card-holder to verify a purchase.
The payment service must be able to handle these payment requests that take a long time to process. If the buy page is hosted by an external PSP, which is quite common these days, the PSP would handle these long-running payment requests in the following ways:
- The PSP would return a pending status to our client. Our client would display that to the user. Our client would also provide a page for the customer to check the current payment status.
- The PSP tracks the pending payment on our behalf, and notifies the payment service of any status update via the webhook the payment service registered with the PSP.
When the payment request is finally completed, the PSP calls the registered webhook mentioned above. The payment service updates its internal system and completes the shipment to the customer.
Network Failures During Fund Transfers:
Network failures can disrupt fund transfers, leading to transaction delays or failures. Common causes include network congestion, server outages, or connectivity issues between payment gateways, banks, and other financial institutions.
Strategies to Mitigate Network Failures:
- Redundant Network Infrastructure: Implement redundant network connections and failover mechanisms to ensure continuous connectivity and minimize the impact of network failures.
- Transaction Monitoring: Use real-time transaction monitoring systems to detect network issues and proactively address them to prevent transaction failures.
- Retry Mechanisms: Implement automatic retry mechanisms to reattempt failed transactions after a network failure, reducing the risk of transaction delays.
- Error Handling and Recovery: Develop robust error handling mechanisms to identify and recover from network failures gracefully, ensuring data integrity and transaction consistency.
- Transaction Queuing: Queue pending transactions during network outages and process them once connectivity is restored, maintaining transaction order and integrity.
Handling failed transactions
Handling failed transactions in a payment system is crucial for ensuring reliability and fault tolerance. Here's how we can tackle these challenges based on the provided information:
Tracking Payment State:
- Definitive Payment State: Maintaining a definitive payment state at every stage of the payment cycle enables us to determine the current state of a transaction in case of failure.
- Persistence: Persisting payment states in an append-only database table ensures that transaction status can be accurately tracked and managed.
Retry Queue and Dead Letter Queue:
- Retry Queue: Retryable errors, such as transient network issues, are routed to a retry queue for subsequent retry attempts.
- Dead Letter Queue: Messages that fail repeatedly end up in the dead letter queue, facilitating debugging and isolation of problematic messages for inspection.
Retry Strategies:
- Immediate Retry: Resending the request immediately upon failure.
- Fixed Intervals: Waiting a fixed amount of time between the failed payment and subsequent retry attempts.
- Incremental Intervals: Gradually increasing the time between retry attempts.
- Exponential Backoff: Doubling the waiting time between retries after each failed attempt to prevent overwhelming the system with retry attempts.
- Cancellation: Canceling the request if the failure is deemed permanent or if repeated attempts are unlikely to succeed.
Example of Retry:
- Scenario: A client attempts to make a $10 payment, but the request fails due to a poor network connection.
- Retry Attempts: The client retries the payment request multiple times until the network connection eventually recovers, ensuring the transaction's successful completion.
- Retry Interval Decision: Deciding appropriate time intervals between retries is crucial to balance between ensuring transaction success and avoiding overwhelming the system with retry attempts.
Trade offs/Tech choices
Trade-off: Consistency vs. Availability
Explanation: We prioritize availability over strong consistency in our payment system to ensure uninterrupted service, even if it means sacrificing a slight degree of data consistency. This choice is vital in financial systems where downtime can lead to significant losses.
Tech Choices: We utilize distributed databases like Apache Cassandra or Amazon DynamoDB, offering high availability and partition tolerance while maintaining eventual consistency to ensure data integrity over time.
Trade-off: Latency vs. Cost in Payment Processing
Explanation: We prioritize low latency in payment processing to enhance user experience and drive increased conversion rates, balancing this with managing operational expenses effectively. This ensures a fast, responsive payment experience while optimizing operational costs.
Tech Choices: To minimize latency, we employ in-memory caching solutions like Redis or Memcached and leverage content delivery networks (CDNs) to cache static assets, optimizing performance without compromising on cost-effectiveness.
Future improvements
Machine Learning for Fraud Detection:
- Implementing machine learning algorithms for fraud detection can enhance the accuracy and efficiency of detecting fraudulent transactions. By analyzing large volumes of transaction data, machine learning models can identify patterns and anomalies indicative of fraudulent behavior.
- Utilizing advanced techniques such as supervised learning for classification and anomaly detection, alongside unsupervised learning for outlier detection, can improve the system's ability to distinguish between legitimate and fraudulent transactions, reducing false positives and negatives.
Enhanced Security and Transparency:
- Implementing blockchain technology can significantly enhance security and transparency in the payment system by creating an immutable and decentralized ledger of transactions.
- Each transaction recorded on the blockchain is cryptographically secured, providing tamper-proof data integrity and ensuring transparent audit trails for every payment transaction.