q: Design an Online Payment Service

Develop a secure and reliable online payment system similar to PayPal, enabling users to send and receive payments electronically. Design interfaces for account management, fund transfers, and payment processing. Implement features like fraud detection, buyer and seller protection, and multi-currency support to ensure secure and seamless transactions for users worldwide.

Requirements


Functional Requirements:

  • Must securely handle online money transfers
  • Must be fraud resistant (track things like transaction amounts , time of transactions, login patterns, user behaviour metrics, device type, ip address)
  • Multi currency support
  • Buyer and seller protection (Dispute system, secure payment, chargeback protection, account security, transparent policies)



Non-Functional Requirements:

  • Must be partition tolerant
  • Must be highly available for non critical user operations
  • Core financial transaction must be strongly consistent and reject requests during data partitions to prevent incorrect balances and double spending


DATABASE DESIGN


User{

id: bigint,

user_email: str,

password: str (hashed and salted),

bucket_id: bigint,

}


Fraud_signal{

id: bigint,

user_id: bigint,

signal_type: enum(ip, device, velocity),

value: string,

observed_at: timestamp

}


Device{

id: bigint,

type: enum(mobile,desktop, etc),

last_login: timestamp,

last_login_ip: str,

user_id: bigint

}


FundingOption{

id: bigint,

provider: enum(Bank, stripe, etc)

token: string,

last4: string,

expiry_month: str,

expiry_year: str

}


Ledger inserts are idempotent

ie. No matter how many times the same transfer request is retried, we will only write into ledger once, most likely by checking if a ledger exists with a certain transfer id(which is unique)

LedgerEntry{

id:bigInt,

user_id: bigint,

account_id: bigint,

amount_minor_units: bigint, (need to use signed amounts for future querying)

transfer_id: bigint (this is globally unique)

currency: enum(USD,CAD,....),

created_at: timestamp


UNIQUE(account_id, transfer_id)

}


Account{

id: bigint,

user_id: bigint,

currency: enum(USD,CAD,...),

account_status: enum(active, frozen, pending funds etc),

funding_option_id: bigint,

balance_minor_units: bigint

}


Transfer{

source_account_id: bigint,

destination_account_id: bigint,

amount_minor_units: bigint,

Unique(idempotency_key): bigint,

created_at: timestamp,

status: enum(completed, pending, on hold, etc)

}


Event{

id: bigint,

event_name: enum(login,fraudsignal,transfer, ...),

event_payload: JSON,

schema_version: int,

created_at: timestamp

}


This table is used to handle provider withdrawal/deposits

ProviderTransaction {

id: bigint

provider: enum(bank, stripe, card_network)

provider_reference_id: string

transfer_id: bigint // internal linkage

amount_minor_units: bigint

currency: enum

status: enum(settled, failed, reversed)

occurred_at: timestamp

}


Created when we discover an discrepancy between providertransaction and our own ledger/transfer table

ReconciliationIssue {

id

transfer_id

provider_reference_id

issue_type

status

detected_at

}



We store events with a schema version for tables like login or fraudsignal that may later be updated with a newer version to its data structure

CAPACITY ESTIMATION:

For a 10 year lifespan and assume 10k tps and around 10 million accounts.

10,000 transfers / second

× 60 seconds

× 60 minutes

× 24 hours

× 365 days

= 315,360,000,000 transfers per year


2 ledger entries per transfer so 6.3 trillion ledger entries

1 transfer entry per transfer -> 3.15 trillion transfer entries


9.45 trillon entries * 100 bytes per entry = 9.45e+14 bytes


So need 945 TB of storage over 10 years


API Design

Define the APIs expected from the system. This is your chance to analyze and define the read and write paths so that you can come up with the high-level design...


POST /api/login

parameters: {user_email, password, ip_address}


This lets a user login, we will use the user's token bucket to rate limit logins and send a fraud signal if too many failed attempts were made. If the user types in the correct credentials, assign them a login token and show them their home page. We also create a login event and store this in the event table, if our login event deviates too much from a certain amount of past logins, we can create a fraud signal.


Server derives:

  • User IP address
  • Geolocation data
  • Device fingerprint
  • Timestamp
  • Login success


Our fraud engine will compare this data with past login events and emit a fraudsignal event if fraudulent login is detected.


Example:

Event {

event_type: FRAUD_SIGNAL

schema_version: 1

payload: {

user_id,

signal_type: "NEW_COUNTRY_LOGIN",

severity: "HIGH",

related_event_id

}

}


Workers consume this fraud signal event asynchronously and send this information to a notification service to send the appropriate notification to a user's email.


GET /api/accounts


User id obtained from login token


  • Fetch all accounts owned by this user_id
  • Use user_id in cache key and store account in cache
  • Invalidate cache key for a certain account if its updated in ledger


GET /api/accounts/{account_id}


  • User ID obtained from login token
  • Verify that the requested account belongs to the authenticated user, then query ledger entries by account_id
  • Save ledger results in paginated cache key which looks like ledger:{account_id}:{cursor}


  • Cache value looks like {

"ledger_entries": [...],

"next_cursor": "97500", -> oldest ledger id in entries

"has_more": true }


  • Cache invalidation is triggered by ledger writes, upon creating a new ledger record, the affected account has their cache invalidated for that count


POST /api/funding_options/


Auth:

  • Requires authenticated user
  • user_id derived from login token


parameters {

provider: enum(Bank, stripe, etc)

token: string,

last4: string,

expiry_month: str,

expiry_year: str

}


  • Allow a user to add a new funding account to their profile, this will require funding account information
  • Uses provider SDK/OAuth to authenticate user login to this funding account
  • Provider returns secure token and metadata (last4 digits and expiry info)
  • Card/bank details never stored directly


POST /api/transfers

parameters{

source_account_id: bigint,

destination_account_id:bigint,

amount_minor_units: bigint,

idempotency_key: bigint

}

Auth:

  • Requires authenticated user
  • user_id derived from auth token


  • Verify source account belongs to user
  • Verify destination account exists and is active
  • Initiate DB transaction
  • We atomically verify and update the balance inside the database transaction
  • We create a transfer record of this transaction with its idempotency key to ensure double transactions are not made. We dont make this an event because incase the db rolls back the transaction, we don't want to have an event for it
  • We emit a ledgerEntry event to write to this table for both the source and destination accounts
  • Invalidate cache for both accounts
  • Have an ML fraud detection service score the user internally based on the transaction, if it is abnormally large then score user based on the transaction being large, if other flags have been raised then stop the transaction from going through and alert user



RACE CONDITION CONSIDERATION:

Race conditions are prevented by enforcing the balance invariant with an atomic conditional update on the account row.


FAILURE CASES:


1. Failure inside the DB transaction

Example:

  • Balance deduction succeeds
  • Ledger insert fails
  • Transfer record insert fails

What happens?

The database transaction rolls back.

Result:

  • Balance deduction is undone
  • No ledger entries exist
  • No partial state


2. Failure after the DB transaction commits

Example:

  • Balance deducted
  • Ledger entries written
  • Transfer committed
  • Downstream step fails (notification, external call, etc.)

Or more realistically:

  • External payout fails
  • Settlement fails later
  • Reversal / chargeback occurs

At this point:

  • You cannot rollback
  • State is already committed

This is where compensating transactions come in.


All balance changes are recorded as immutable ledger entries. If a transfer fails after committing, we issue compensating ledger entries rather than rolling back state. This preserves auditability and ensures balances remain correct.


FRUAD DETECTION:

We can make use of an ML service that ingests the user's id and their transaction history as well as login history to create a score to deny access to the application and alert the user's email. This will be done internally after a transaction has been attempted and potentially blocked if other fraudulent activities have been flagged like new login location etc


High-Level Design

Describe the overall system architecture. Identify the main components needed to solve the problem end-to-end. Use the diagramming tool to create a block diagram.




Detailed Component Design

login -> We can add a ML service for fraud detection, for example upon a fraudulent login we can give the user's login a score:

NEW_DEVICE +20

NEW_COUNTRY_LOGIN +30

IMPOSSIBLE_TRAVEL +50

HIGH_VALUE_TRANSFER +40


For logins under a score of 30 allow, for logins above 30 but below 70 initiate a two factor authentication and for logins above 70 deny access and alert the user's email


Failure scenarios


FAILURE CASES:


1. Failure inside the DB transaction

Example:

  • Balance deduction succeeds
  • Ledger insert fails
  • Transfer record insert fails

What happens?

The database transaction rolls back.

Result:

  • Balance deduction is undone
  • No ledger entries exist
  • No partial state


2. Failure after the DB transaction commits

Example:

  • Balance deducted
  • Ledger entries written
  • Transfer committed
  • Downstream step fails (notification, external call, etc.)

Or more realistically:

  • External payout fails
  • Settlement fails later
  • Reversal / chargeback occurs

At this point:

  • You cannot rollback
  • State is already committed

This is where compensating transactions come in.


All balance changes are recorded as immutable ledger entries. If a transfer fails after committing, we issue compensating ledger entries rather than rolling back state. This preserves auditability and ensures balances remain correct.


Trade offs:

I would use a relational database for this because we will be receiving well defined data for each table. I also want a strongly consistent database to ensure ledger and balances are correct


I traded off storage and complexity for fast and safe balance checks by using a ledger + balance snapshot that


I enforce idempotency on the transfer layer instead of ledger layer. This prevents duplicate money movement and clearly separates concerns but increases schema complexity for transfer


I also made my ledger append only instead of mutable, this ensures the ledger is my source of truth by providing a full audit trail and helps with easy rollbacks but causes our storage to be used up very fast


I also chose to partition my storage instead of using a monolithic DB. This is scalable to trillions of entries but makes cross partition queries harder