My Solution for Design an Auction system with Score: 7/10

by redzrdi

System requirements


Functional:

  1. A user should be able to sign up to the auction service to participate in auctions held in the system as seller or buyer/bidder .
  2. A registered user should be able to create an auction with the following details :-
    1. Item details -- name , description , quantity , photos of the item
    2. Start time of auction
    3. End time of auction
    4. Base price ( optional )
    5. Winner selection strategy type and details - Type can be Manual or automatic . If automatic there can be some preset strategies default being Highest bidder wins.
    6. Inventory details - location details of the place from where the item can be picked up.
    7. Payment details - The account where payment will be received . This can be taken from his profile if not provided for the particular instance of auction .
    8. labels for the auction instance for easy discoverability.
  3. An auction creator should not be able to bid for his own created auctions.
  4. A user on login to the system should be able to view list of currently live or future auctions . There should also be a rich search experience where the user is able to search for auctions of items matching his interest ( item type , price level .. etc )
  5. On selecting a live auction , a user should be able to see details of a live auction - name , description , image , list of bids placed so far , auction start and end times
  6. If a user chooses to participate in an auction as a bidder , he has to place a bid with the following details -
    1. Bid price
    2. Optional a/c details if auto-debit is opted for .
  7. An auction goes through the following states -
    1. CREATED -
    2. VALIDATED
    3. ACCEPTING_BIDS - the action is open and accepting bids.
    4. BIDDING_CLOSED - the auction is no longer accepting bids . THere might be a user action there to manually select a winner.
    5. PROCESSING_PAYMENT
      1. CREATE_PAYMENT_PLAN
      2. PROCESS_PAYMENT
      3. RECORD_TXN
    6. FULFILMENT_IN_PROGRESS
    7. FINISHED_SUCCESSFULLY
    8. AUCTION_FAILED - this is the error terminal state. These errors are in Payment_Processing or FULFILMENT_IN_PROGRESS states
    9. FINISHED_WITH_NO_BID - if no bids were received within the auction time window.
    10. AUCTION_DECLINED - validation failure
  8. A bidder is notified about ( selected ) state changes of the auction he has participated in . In case he wins an auction - his notification message will contain the payment details he has to make . Thereafter he will receive fine grained notification about the state of the shipping until he receives the shipment and the action is deemed finsihed successfully.
  9. In case of failure in the main auction workflow there are compensatory workflows defined . E.g if fulfiment system reports a terminal failure , then the recompensation flow is triggered whereby the payment might be reversed and/or other additional credits are given to the aggrieved party.
  10. The system supports an admin user persona who can login to the system for the following actions -
    1. Set auction validation rules
    2. Set Pricing policy , business campaigns whereby user gets spcial discounts or credits for running the acution.
  11. There is also a reputation scoring system that assigns reputation point to the buyers and sellers based on their transaction history
  12. The auction subdomain( core subdomain ) is responsible for owning and running the auction process and handling the core logic of the auction state machine . It is also responsible for handling pricing . It is also responsible for the enabling end user experience in the auction system .
  13. The auction subdomain integrates with the following supporting subdomains for specific functionality in the overall business -
    1. User Management System -- User Management owns the user registration , profile maintenance , regulatory verifications etc
    2. Payment Orchestration System -- Provides the necessary orchestration to pull payment from the bidder payment instrument and credit the seller a/c. It maintains integration with various Payment Gateways
    3. User Notification System - This system is responsible for delivering notifications from other system to the end user in a reliable and scalable way.
    4. Fulfilment System - This system is responsible for picking up sold item from the seller location and ship it to the buyer location reliably.
    5. Transaction Recording System - This system records the financial transaction . It is important for reconciliation and revenue realization.



Non-Functional:

  1. Scalability - The system should be able to support high traffic especially spiky write load during popular and flash auctions that are active for short windows
  2. Strong consistency - Strong consistency is required as financial txn is involved and the workflow spans multiple systems




Capacity estimation

  1. Read Throughput estimate - Assuming DAU of 7000 . Assuming 60 read queries per user . There are total of a 420,000 read queries .

Assuming 50% of these queries come in during 4 hours of peak time per day ,


Peak read api throughput = = 14 tps. This throughput can be delivered by a single core. Here for the sake of High availability a minimum of 2 large ec2 instances ( 2 core , 4 gb memory can be considred )

  1. Write throughput estimated - Assuming DAU of 7000 , 5 auction per active user per month , translates to 1200 new auctions created per day. Assuming an average of 10 bids per auction , it is 12000 bids per day. In a message centric design there are about 7 messages per auction ( based on the state machine ) - Thus 8400 new entries for auction per day . Assuming a standard size of 5 kb/row for both auction message and bid entry table , total size per day = 52000 kb + 60000 kb = 112 mb / day. For a retention of 2 years , Total data volume to be stored = 112 mb/day * 730 days = 81.75 GB => 100 GB ( to account for the index space )

Assuming 50% of the write volume per day happens during 4 hours,

Peak IO bandwidth = (112/2)/4 = 14 mb/hr = 3.8 kbps.

Based on these numbers, rdsInstance of 1 small or medium instanceType with 100 GB disk space available is sufficient for the master .

To ensure High Availability , a 2+2 instance deployment in 2 AZ s is recommended. Single master , hot secondary setup.

Write service estimate -- From the above calculation , total api write volume per day is 13200 .

Thus, Peak api write throughput = 1650 / hr = 0.5 tps.


For operational simplicity , the read and write api s can be served from same service as their combined throughput requirement as per this calculation is rather less . The estimation is dominated by high availability concern


THere should be an active data archival policy scheduled on regular interval to maintain the size





API design

  1. registerUser - this an event driven integration with the User Management system UserCreated Event.
  2. UpdateUserStatus - this an event driven integration with the User Management system UserDeleted/Updated Event. Only
  3. createAuction - POST /auctions
  4. UpdateAuction - PUT /auctions/<auctionID>
  5. Get Auction Details - GET /auctions/<auctionID>
  6. Search auctions - POST /auctions/search -- provides a rich search experience like full text search , faceted match etc
  7. place Bid - POST /auctions/<auctionID>/bids
  8. cancel Bid - PUT /auctions/<auctionID>/bids/<bidID>
  9. list Bids -- GET /auctions/<auctionID>/bids
  10. Get winning Bid -- GET /auctions/<auctionID>/winningBid
  11. Get settlement invoice -- GET /auctions/<auctionID>/invoices
  12. Initiate Payment -- POST /auctions/<auctionID>/payments
  13. Get Payment status -- GET /auctions/<auctionID>/payments/<paymentID>
  14. Add MFA challenge details( e.g otp) to PAYMENT -- POST /auctions/<auctionID>/payments/<paymentID>/challenges
  15. Get receipt - POST POST /auctions/<auctionID>/receipts





Database design

  1. The db ER diagram shows a highly normalized structure meant for relational db in the write path . The normalized structure , helps in attaining strong consistency .
  2. The payment and fulfilment stages of the overall Auction entities' lifecycle are completed by the orchestrator using configurable workflow definition by the orchestrator . Here the exit events of each stage of the workflow is stored . on the write path this append only , immutable structure allows to achieve high consistency and better performance while avoiding the need for expencisve co-ordination as would be needed for a mutable structure.






High-level design

  1. Auction Service - This microservice is responsible for managing the Auction Domain entitie's( aggregate root) lifecycle ( including the bid ) . The state machine associated with the Auction DOmain tracks the lifecycle till the bidding is closed and a winning bid is selected. The Post processign i.e Payment Processing and Fulfilment is kept outside the scope of this state machine . This service is only responsible for handling the write traffic and can be scaled and be made highly available independently.
  2. Auction Orchestration Service - This service runs the workflows that define the Payment Processing and Fulfilment stages . A different microservice here helps in independently tuning this component . NFR focus of this component is high throughput and high resiliency . Choice of an appropriate technology ( workflow engines like Camunda ) can be done in this modelling.
  3. Auction Read Service - This service provides a flexible - full text faceted search on the all the auctions in the system . It show s an aggregated ( denormalized ) Auction Read model that encompasses the payment and fulfilment states and their substates too . It also includes all artifacts like invoice , payment details , receipt , shippign waybill etc from this extended stage machine as part of the same entity. Read service might also read some data like bids for active auction directly from read replica of the auction service to ensure freshness
  4. Payment Engine - Payment Engine owns all payment related entities and logic . This functionality is required for interacting the Payment Orchestrator and Txn Recorder subsystems.
  5. Admin Service - It is the common repository for the different microservices
  6. Api Service - This stateless microservice acts as a proxy and surfaces a unified api layer for the consumers .
  7. Central Message Broker - This enables async communication between various microservices in the system . E,g - Auction service triggering workflow in Orchestrator . Read model aggregation from relevant events from different microservices or async communication to external systems through the corresponding gateways.
  8. Auction Event Aggregation pipeline - it listens to auction events from auction service and orchestrator microservic and aggregates and creates/updates a denormalized AuctionRead Entity. This uses flink for reliable and scalable stream processing






Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...






Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...






Trade offs/Tech choices

  1. Relational DB is selected on the write path to enable strong consistency on bidding flow.
  2. Orchestrator and AuctionService are split in 2 different service as they have different NFR and different tech choices . Whereas AuctionService is traditional webapp providing sync REST api and is tuned for low latency , strong consistency , Orchestrator is throughput and most interactions are async.





Failure scenarios/bottlenecks

  1. Event propagation to the ElasticSearch index through the async flow and aggregation pipeline could take more time and User might not see the most up to date information to take correct decision.

Mitigation - Read service might intelligently identify the auctions whose details are most likely to be stale , e.g the ones with closes bid_closing time and augment the ES results with OLTP db data from read replica.

  1. In a high load bid scenario , especially if many bids are expected to have the same price , a tie-breaker policy needs to be defined . The default - local system time based solution ( even if db server time is considered ) is subject to clock skew and might cause wrong result ( a later bid having an earlier timestamp)




Future improvements

  1. AuctionService can be made to interact with kafka topic backend instead of a db for auction and bid entity . Total ordering for auction and bid events along with durability and high availability guarantees might give the same strong consistency guarantees as was achievable with db.