Design An Online Payment Service - System Design

q: Design an Online Payment Service

Develop a secure and reliable online payment system similar to PayPal, enabling users to send and receive payments electronically. Design interfaces for account management, fund transfers, and payment processing. Implement features like fraud detection, buyer and seller protection, and multi-currency support to ensure secure and seamless transactions for users worldwide.

Requirements

Functional Requirements:

Must securely handle online money transfers
Must be fraud resistant (track things like transaction amounts , time of transactions, login patterns, user behaviour metrics, device type, ip address)
Multi currency support
Buyer and seller protection (Dispute system, secure payment, chargeback protection, account security, transparent policies)

Non-Functional Requirements:

Must be partition tolerant
Must be highly available for non critical user operations
Core financial transaction must be strongly consistent and reject requests during data partitions to prevent incorrect balances and double spending

DATABASE DESIGN

User{

id: bigint,

user_email: str,

password: str (hashed and salted),

bucket_id: bigint,

}

Fraud_signal{

id: bigint,

user_id: bigint,

signal_type: enum(ip, device, velocity),

value: string,

observed_at: timestamp

}

Device{

id: bigint,

type: enum(mobile,desktop, etc),

last_login: timestamp,

last_login_ip: str,

user_id: bigint

}

FundingOption{

id: bigint,

provider: enum(Bank, stripe, etc)

token: string,

last4: string,

expiry_month: str,

expiry_year: str

}

Ledger inserts are idempotent

ie. No matter how many times the same transfer request is retried, we will only write into ledger once, most likely by checking if a ledger exists with a certain transfer id(which is unique)

LedgerEntry{

id:bigInt,

user_id: bigint,

account_id: bigint,

amount_minor_units: bigint, (need to use signed amounts for future querying)

transfer_id: bigint (this is globally unique)

currency: enum(USD,CAD,....),

created_at: timestamp

UNIQUE(account_id, transfer_id)

}

Account{

id: bigint,

user_id: bigint,

currency: enum(USD,CAD,...),

account_status: enum(active, frozen, pending funds etc),

funding_option_id: bigint,

balance_minor_units: bigint

}

Transfer{

source_account_id: bigint,

destination_account_id: bigint,

amount_minor_units: bigint,

Unique(idempotency_key): bigint,

created_at: timestamp,

status: enum(completed, pending, on hold, etc)

}

Event{

id: bigint,

event_name: enum(login,fraudsignal,transfer, ...),

event_payload: JSON,

schema_version: int,

created_at: timestamp

}

This table is used to handle provider withdrawal/deposits

ProviderTransaction {

id: bigint

provider: enum(bank, stripe, card_network)

provider_reference_id: string

transfer_id: bigint // internal linkage

amount_minor_units: bigint

currency: enum

status: enum(settled, failed, reversed)

occurred_at: timestamp

}

Created when we discover an discrepancy between providertransaction and our own ledger/transfer table

ReconciliationIssue {

transfer_id

provider_reference_id

issue_type

status

detected_at

}

We store events with a schema version for tables like login or fraudsignal that may later be updated with a newer version to its data structure

CAPACITY ESTIMATION:

For a 10 year lifespan and assume 10k tps and around 10 million accounts.

10,000 transfers / second

× 60 seconds

× 60 minutes

× 24 hours

× 365 days

= 315,360,000,000 transfers per year

2 ledger entries per transfer so 6.3 trillion ledger entries

1 transfer entry per transfer -> 3.15 trillion transfer entries

9.45 trillon entries * 100 bytes per entry = 9.45e+14 bytes

So need 945 TB of storage over 10 years

API Design

Define the APIs expected from the system. This is your chance to analyze and define the read and write paths so that you can come up with the high-level design...

POST /api/login

parameters: {user_email, password, ip_address}

This lets a user login, we will use the user's token bucket to rate limit logins and send a fraud signal if too many failed attempts were made. If the user types in the correct credentials, assign them a login token and show them their home page. We also create a login event and store this in the event table, if our login event deviates too much from a certain amount of past logins, we can create a fraud signal.

Server derives:

User IP address
Geolocation data
Device fingerprint
Timestamp
Login success

Our fraud engine will compare this data with past login events and emit a fraudsignal event if fraudulent login is detected.

Example:

Event {

event_type: FRAUD_SIGNAL

schema_version: 1

payload: {

user_id,

signal_type: "NEW_COUNTRY_LOGIN",

severity: "HIGH",

related_event_id

}

Workers consume this fraud signal event asynchronously and send this information to a notification service to send the appropriate notification to a user's email.

GET /api/accounts

User id obtained from login token

Fetch all accounts owned by this user_id
Use user_id in cache key and store account in cache
Invalidate cache key for a certain account if its updated in ledger

GET /api/accounts/{account_id}

User ID obtained from login token
Verify that the requested account belongs to the authenticated user, then query ledger entries by account_id
Save ledger results in paginated cache key which looks like ledger:{account_id}:{cursor}

Cache value looks like {

"ledger_entries": [...],

"next_cursor": "97500", -> oldest ledger id in entries

"has_more": true }

Cache invalidation is triggered by ledger writes, upon creating a new ledger record, the affected account has their cache invalidated for that count

POST /api/funding_options/

Auth:

Requires authenticated user
user_id derived from login token

parameters {

provider: enum(Bank, stripe, etc)

token: string,

last4: string,

expiry_month: str,

expiry_year: str

}

Allow a user to add a new funding account to their profile, this will require funding account information
Uses provider SDK/OAuth to authenticate user login to this funding account
Provider returns secure token and metadata (last4 digits and expiry info)
Card/bank details never stored directly

POST /api/transfers

parameters{

source_account_id: bigint,

destination_account_id:bigint,

amount_minor_units: bigint,

idempotency_key: bigint

}

Auth:

Requires authenticated user
user_id derived from auth token

Verify source account belongs to user
Verify destination account exists and is active
Initiate DB transaction
We atomically verify and update the balance inside the database transaction
We create a transfer record of this transaction with its idempotency key to ensure double transactions are not made. We dont make this an event because incase the db rolls back the transaction, we don't want to have an event for it
We emit a ledgerEntry event to write to this table for both the source and destination accounts
Invalidate cache for both accounts
Have an ML fraud detection service score the user internally based on the transaction, if it is abnormally large then score user based on the transaction being large, if other flags have been raised then stop the transaction from going through and alert user

RACE CONDITION CONSIDERATION:

Race conditions are prevented by enforcing the balance invariant with an atomic conditional update on the account row.

FAILURE CASES:

1. Failure inside the DB transaction

Example:

Balance deduction succeeds
Ledger insert fails
Transfer record insert fails

What happens?

The database transaction rolls back.

Result:

Balance deduction is undone
No ledger entries exist
No partial state

2. Failure after the DB transaction commits

Example:

Balance deducted
Ledger entries written
Transfer committed
Downstream step fails (notification, external call, etc.)

Or more realistically:

External payout fails
Settlement fails later
Reversal / chargeback occurs

At this point:

You cannot rollback
State is already committed

This is where compensating transactions come in.

All balance changes are recorded as immutable ledger entries. If a transfer fails after committing, we issue compensating ledger entries rather than rolling back state. This preserves auditability and ensures balances remain correct.

FRUAD DETECTION:

We can make use of an ML service that ingests the user's id and their transaction history as well as login history to create a score to deny access to the application and alert the user's email. This will be done internally after a transaction has been attempted and potentially blocked if other fraudulent activities have been flagged like new login location etc

High-Level Design

Describe the overall system architecture. Identify the main components needed to solve the problem end-to-end. Use the diagramming tool to create a block diagram.

Detailed Component Design

login -> We can add a ML service for fraud detection, for example upon a fraudulent login we can give the user's login a score:

NEW_DEVICE +20

NEW_COUNTRY_LOGIN +30

IMPOSSIBLE_TRAVEL +50

HIGH_VALUE_TRANSFER +40

For logins under a score of 30 allow, for logins above 30 but below 70 initiate a two factor authentication and for logins above 70 deny access and alert the user's email

Failure scenarios

FAILURE CASES:

1. Failure inside the DB transaction

Example:

Balance deduction succeeds
Ledger insert fails
Transfer record insert fails

What happens?

The database transaction rolls back.

Result:

Balance deduction is undone
No ledger entries exist
No partial state

2. Failure after the DB transaction commits

Example:

Balance deducted
Ledger entries written
Transfer committed
Downstream step fails (notification, external call, etc.)

Or more realistically:

External payout fails
Settlement fails later
Reversal / chargeback occurs

At this point:

You cannot rollback
State is already committed

This is where compensating transactions come in.

Trade offs:

I would use a relational database for this because we will be receiving well defined data for each table. I also want a strongly consistent database to ensure ledger and balances are correct

I traded off storage and complexity for fast and safe balance checks by using a ledger + balance snapshot that

I enforce idempotency on the transfer layer instead of ledger layer. This prevents duplicate money movement and clearly separates concerns but increases schema complexity for transfer

I also made my ledger append only instead of mutable, this ensures the ledger is my source of truth by providing a full audit trail and helps with easy rollbacks but causes our storage to be used up very fast

I also chose to partition my storage instead of using a monolithic DB. This is scalable to trillions of entries but makes cross partition queries harder

q: Design an Online Payment Service

Requirements

Functional Requirements:

Must securely handle online money transfers
Must be fraud resistant (track things like transaction amounts , time of transactions, login patterns, user behaviour metrics, device type, ip address)
Multi currency support
Buyer and seller protection (Dispute system, secure payment, chargeback protection, account security, transparent policies)

Non-Functional Requirements:

Must be partition tolerant
Must be highly available for non critical user operations
Core financial transaction must be strongly consistent and reject requests during data partitions to prevent incorrect balances and double spending

DATABASE DESIGN

User{

id: bigint,

user_email: str,

password: str (hashed and salted),

bucket_id: bigint,

}

Fraud_signal{

id: bigint,

user_id: bigint,

signal_type: enum(ip, device, velocity),

value: string,

observed_at: timestamp

}

Device{

id: bigint,

type: enum(mobile,desktop, etc),

last_login: timestamp,

last_login_ip: str,

user_id: bigint

}

FundingOption{

id: bigint,

provider: enum(Bank, stripe, etc)

token: string,

last4: string,

expiry_month: str,

expiry_year: str

}

Ledger inserts are idempotent

ie. No matter how many times the same transfer request is retried, we will only write into ledger once, most likely by checking if a ledger exists with a certain transfer id(which is unique)

LedgerEntry{

id:bigInt,

user_id: bigint,

account_id: bigint,

amount_minor_units: bigint, (need to use signed amounts for future querying)

transfer_id: bigint (this is globally unique)

currency: enum(USD,CAD,....),

created_at: timestamp

UNIQUE(account_id, transfer_id)

}

Account{

id: bigint,

user_id: bigint,

currency: enum(USD,CAD,...),

account_status: enum(active, frozen, pending funds etc),

funding_option_id: bigint,

balance_minor_units: bigint

}

Transfer{

source_account_id: bigint,

destination_account_id: bigint,

amount_minor_units: bigint,

Unique(idempotency_key): bigint,

created_at: timestamp,

status: enum(completed, pending, on hold, etc)

}

Event{

id: bigint,

event_name: enum(login,fraudsignal,transfer, ...),

event_payload: JSON,

schema_version: int,

created_at: timestamp

}

This table is used to handle provider withdrawal/deposits

ProviderTransaction {

id: bigint

provider: enum(bank, stripe, card_network)

provider_reference_id: string

transfer_id: bigint // internal linkage

amount_minor_units: bigint

currency: enum

status: enum(settled, failed, reversed)

occurred_at: timestamp

}

Created when we discover an discrepancy between providertransaction and our own ledger/transfer table

ReconciliationIssue {

transfer_id

provider_reference_id

issue_type

status

detected_at

}

We store events with a schema version for tables like login or fraudsignal that may later be updated with a newer version to its data structure

CAPACITY ESTIMATION:

For a 10 year lifespan and assume 10k tps and around 10 million accounts.

10,000 transfers / second

× 60 seconds

× 60 minutes

× 24 hours

× 365 days

= 315,360,000,000 transfers per year

2 ledger entries per transfer so 6.3 trillion ledger entries

1 transfer entry per transfer -> 3.15 trillion transfer entries

9.45 trillon entries * 100 bytes per entry = 9.45e+14 bytes

So need 945 TB of storage over 10 years

API Design

Define the APIs expected from the system. This is your chance to analyze and define the read and write paths so that you can come up with the high-level design...

POST /api/login

parameters: {user_email, password, ip_address}

Server derives:

User IP address
Geolocation data
Device fingerprint
Timestamp
Login success

Our fraud engine will compare this data with past login events and emit a fraudsignal event if fraudulent login is detected.

Example:

Event {

event_type: FRAUD_SIGNAL

schema_version: 1

payload: {

user_id,

signal_type: "NEW_COUNTRY_LOGIN",

severity: "HIGH",

related_event_id

}

Workers consume this fraud signal event asynchronously and send this information to a notification service to send the appropriate notification to a user's email.

GET /api/accounts

User id obtained from login token

Fetch all accounts owned by this user_id
Use user_id in cache key and store account in cache
Invalidate cache key for a certain account if its updated in ledger

GET /api/accounts/{account_id}

User ID obtained from login token
Verify that the requested account belongs to the authenticated user, then query ledger entries by account_id
Save ledger results in paginated cache key which looks like ledger:{account_id}:{cursor}

Cache value looks like {

"ledger_entries": [...],

"next_cursor": "97500", -> oldest ledger id in entries

"has_more": true }

Cache invalidation is triggered by ledger writes, upon creating a new ledger record, the affected account has their cache invalidated for that count

POST /api/funding_options/

Auth:

Requires authenticated user
user_id derived from login token

parameters {

provider: enum(Bank, stripe, etc)

token: string,

last4: string,

expiry_month: str,

expiry_year: str

}

Allow a user to add a new funding account to their profile, this will require funding account information
Uses provider SDK/OAuth to authenticate user login to this funding account
Provider returns secure token and metadata (last4 digits and expiry info)
Card/bank details never stored directly

POST /api/transfers

parameters{

source_account_id: bigint,

destination_account_id:bigint,

amount_minor_units: bigint,

idempotency_key: bigint

}

Auth:

Requires authenticated user
user_id derived from auth token

Verify source account belongs to user
Verify destination account exists and is active
Initiate DB transaction
We atomically verify and update the balance inside the database transaction
We create a transfer record of this transaction with its idempotency key to ensure double transactions are not made. We dont make this an event because incase the db rolls back the transaction, we don't want to have an event for it
We emit a ledgerEntry event to write to this table for both the source and destination accounts
Invalidate cache for both accounts
Have an ML fraud detection service score the user internally based on the transaction, if it is abnormally large then score user based on the transaction being large, if other flags have been raised then stop the transaction from going through and alert user

RACE CONDITION CONSIDERATION:

Race conditions are prevented by enforcing the balance invariant with an atomic conditional update on the account row.

FAILURE CASES:

1. Failure inside the DB transaction

Example:

Balance deduction succeeds
Ledger insert fails
Transfer record insert fails

What happens?

The database transaction rolls back.

Result:

Balance deduction is undone
No ledger entries exist
No partial state

2. Failure after the DB transaction commits

Example:

Balance deducted
Ledger entries written
Transfer committed
Downstream step fails (notification, external call, etc.)

Or more realistically:

External payout fails
Settlement fails later
Reversal / chargeback occurs

At this point:

You cannot rollback
State is already committed

This is where compensating transactions come in.

FRUAD DETECTION:

High-Level Design

Describe the overall system architecture. Identify the main components needed to solve the problem end-to-end. Use the diagramming tool to create a block diagram.

Detailed Component Design

login -> We can add a ML service for fraud detection, for example upon a fraudulent login we can give the user's login a score:

NEW_DEVICE +20

NEW_COUNTRY_LOGIN +30

IMPOSSIBLE_TRAVEL +50

HIGH_VALUE_TRANSFER +40

For logins under a score of 30 allow, for logins above 30 but below 70 initiate a two factor authentication and for logins above 70 deny access and alert the user's email

Failure scenarios

FAILURE CASES:

1. Failure inside the DB transaction

Example:

Balance deduction succeeds
Ledger insert fails
Transfer record insert fails

What happens?

The database transaction rolls back.

Result:

Balance deduction is undone
No ledger entries exist
No partial state

2. Failure after the DB transaction commits

Example:

Balance deducted
Ledger entries written
Transfer committed
Downstream step fails (notification, external call, etc.)

Or more realistically:

External payout fails
Settlement fails later
Reversal / chargeback occurs

At this point:

You cannot rollback
State is already committed

This is where compensating transactions come in.

Trade offs:

I would use a relational database for this because we will be receiving well defined data for each table. I also want a strongly consistent database to ensure ledger and balances are correct

I traded off storage and complexity for fast and safe balance checks by using a ledger + balance snapshot that

I enforce idempotency on the transfer layer instead of ledger layer. This prevents duplicate money movement and clearly separates concerns but increases schema complexity for transfer

I also chose to partition my storage instead of using a monolithic DB. This is scalable to trillions of entries but makes cross partition queries harder