Design An Online Payment Service - System Design

Requirements

Functional Requirements:

Allow users to send money and receive payments,
Enable user onboarding flow like kyc and account setup
Enavle fraud detection for users by using AML and ML algorithms
Enable the buyer and seller data is protected
Implement multicurrenty support with conversion and all
We should also preserve transaction history and money tracking

Non-Functional Requirements:

For non-functional requirements, I think our system should be consistent, that there should not be a mismatch between the fund transferred and received, and also I would like to have reliable reliability that the fund has to be done, like has to be transferred. And I think here we can have like the exact ones thing where a payment should be able to do it exactly once. We should not double charge the customer. And I think we also need to maintain the ledger services, like we also need to, we can trace back if there is anything, if there is the problem.
Our system should be highly available a high uptime 99.99% uptime
System should be PCI complaince
We should update the customer immediately on if the transaction has been completed
Our system should be reliability such that we are able to deliver the result to the customer accuratley like at most once delivery
System should be scalable when load is increased

API Design

Define the APIs expected from the system. This is your chance to analyze and define the read and write paths so that you can come up with the high-level design...

User Management

Write-

Post/RegisterUser (User id, login,2fa)

get/KycCheck(User id, details)

get/getUserDetails

get/UserBalance

SendMoney

Post/sendPayment{recipient, sender, amount,currency}

Post/receivePayment{recipient, sender, amount,currency}

get/getFraudCheck

get/MLAlgorigth

get/checkPCICompliance

Post/addTotransaction{amount,currency,timeStamp}

Post/addToFile{Amount, data}

Post/getUserHistory

post/addPaymentmethod{credtcard/debitcard/banl account}

post/updatePaymentmethod{paymentMethodid}

post/deletePaymentmethod{paymentMethodid}

High-Level Design

Describe the overall system architecture. Identify the main components needed to solve the problem end-to-end. Use the diagramming tool to create a block diagram.

First, I would like to add what all things are going on here. We will have a client, which can be via mobile or via phone, and they will talk to like API Gateway whenever they want to do anything, like if they want to register or they want to do a transaction. And at the same point of time, we can have, let's talk about users registration flow. I think for the user registration, we will have a normal ID login, and then if they want to actually transact, we can have a KYC flow for them. All the KYC stores will be checked into their S3 bucket, and we will also have their present tick information in the session storage and everything in the cache. And now coming to like when they want to send a money or receive a money, we will go to transaction service. It will like talk to send an event through the payment gateway, and that service will be having an item potency key. Like we will have a UUID version 4, which is good for cryptography and all, and we'll have a user ID specific UUIDv4 that we will send to transaction service via external payment gateway at the same point of time, and we will log that event into the Kafka events as a pending. And then the payment gateway will send us some, will try to send us the invocation that yeah, it is configured and the payment gateway will then send to the Visa provider, like the acquirer, and then it will go to the actual thing. And once the thing from payment gateway comes to us, we will just mark this thing as processing and then once it will be marked as completed. And now at the same point of time, we'll complete the ledger part of it where we will put all those things into a write ahead log. Everything will be appended only log for it. And in the ledger we will just append like what's the amount created and which account was debited. And at the same point of time, we'll also do the wallet service, which will check in the Redis. Before doing all this thing, we'll also have a fraud check. Fraud check will take care of things if anything is missing on the loop. If this is the, and that fraud check will have their own like algorithms, like they can check either the last few transactions and they can see if the user is flagged or not. And then if suppose user is trying to send more money or suppose user is trying to send like one from India and one from UK at the same point of time, then also it will be flagged. And then we will also run like some kind of ML algorithms, which can be, we can train or we can get it from the someone. And those can be used for fraud checks. Once everything is done, then we will send the notification service to kind of send them an SMS or email, whatever the user has preferred. And those things can be taken into picture.

Regarding few things like if suppose we get any external payment gateway outage, we will have a circuit breaker pattern for it and we will try to keep the connection closed. And because external payment gateway outage is very rare, so we still will have to say that we can't generate these things and we can keep it half open and try to send one request and kind of have it this way. And also what we can do is we can then send all the events to a queue which can actually process it once the external payment gateway comes online.Second thing what we can do is for the idempotency check, we will go with the same level and we will try to go with the external payment gateway. If by chance user takes twice, we will have external payment gateway will have their own internal calculation, but we will see send the same user ID and the token so that external payment gateway can tell us that if it's processed or not. We will first go with the idempotency check on the Redis side and then also we can, if suppose there is the same, we can also check the DB on the DB side if this particular process is already processed or not. Considering we are keeping a track on pending processing and completed transaction, it will be easier for us to kind of give it to that. In the Redis cache, we will store the UUID plus user ID kind of combination key and along with the response so that every response is correctly mapped and we can actually give you the proper checking.

So regarding the case on how conflicts of failure during payment processing will be managed, if there is a failure of the payment processing, we will retry, use the retry mechanism with the exponential backoff and jitter. Why jitter is because if suppose there are many customers who are trying to kind of have a retry, so that will be a problem. So we will have a jitter and we will give them so that all the customers will not retry at the same time. And if there is a failure, we will keep a retry and if suppose by any chance, even after all the retry.Yeah, we'll keep it in the dead letter queue.

Detailed Component Design

Deep dive into 2-3 key components. Explain how they work, how they scale, discuss tradeoffs, capacity, and any relevant algorithms or data structures.

Well, for deep dive, I think we will mainly focus on the external payment gateway. So once a user registers an event, we will first log a payment event and a checkout service. So the checkout service will just have the sender, receiver, and those things, and that will be an event which is went into the external payment gateway, and we will create a token and kind of receive it in our transaction service and in our Kafka event with the pending size. And then the payment gateway will move it back to the acquirer, like a visa or somewhere, and then we will go it to the issuer. So once we get some information from them, we can mark this as processing and then relevantly on successful and failure rate, we will mark it as completed or failed. At the same point of time, we need to worry about like item potency that same event cannot be processed twice. So we need to have an item potency key for each payment and that particular value should be stored in the cache. The cache value because on the retry maximum and our retry count should be exponential backup. So we need to store the value which is in the cache, the TTL should be greater than for those for those keys. And so that we can actually perform exponential backup on a greater good manner.

Now because it is the on-off fashion that we have a lot of service to communicate to, we will try to keep it in a saga pattern where all the service can talk to each other and kind of do the compensating transaction if anything goes back. And for our saga pattern, we can go with the orchestration one here. I know there is a single point of failure, but we can scale the orchestrator. And because there are multiple services we have to interact, and that's the individual pattern will be an issue. So orchestration can be a good thing for payment, I guess.For storage, we will take the relational database storage because we wanted to store a lot of things and as well as we wanted to keep it as consistent and also preserve the atomicity of the transactions. As well as we will just make sure that we will keep everything append-only. There is nothing that can be edited. Anything that can be deleted. Everything has to be append-only so that in case anything goes bad, in case anything is missing, we will be able to replay the events and able to understand what actually went wrong. We will also maintain the ledger, like there will be a double account booking ledger so that everything is correctly traced.