System requirements


Functional:

1 user can view event list, search event with keywords

2 user can book a ticket of particular event

3 if book success, user have to pay in 5 minutes

4 user receive a notification after the payment is succeed

5 Assume all tickets are the same

6 Don't allow anonymous users to buy tickets.


Non-Functional:

CAP:

partition tolerance is a must.

Strong consistency because it's related to money.

Tolerant lower availability.

Latency:

Consider all tickets sold out in 10s. Millions users conpeting the tickets. Support low latency to purchase a ticket

scalability

Using a microservice architecture makes sure that the app can be scaled horizontally.


Capacity estimation

There's 100 ongoing event every day.

Each event has 10k tickets in average.

Book Tps = 10K * 100 / 86400s = 11

Peak:

Hot event

1 million users are competing for the tickets, and the ticket is sold out in 10s.

write Tps 1million / 10s = 100k in peak.


user keep flushing the page before the event is availalble.

user flush page every second.

read qps 1million in peak.


storage

100 events every day, 100k tickets every events

1KB for ticket, order, payment info in total.

10 * 10k * 1KB *365 = 36.5G/year

considering read replica, 36.5G * 3 = 110G



API design

api/v1/Event/view

request {eventId}

response

{

eventId

desc

ticketNum

price

detail

}

api/v1/order/create

request {userId, paymentInfo, eventId}

response {statusCode, error}

api/v1/order/view

request {userId, orderId}

response

{

{

orderId

status

detail

}

}

api/v1/payment/

request {userId, orderId, paymentInfo}

response {statusCode}

api/v1/payment/callBack

request {orderId, statusCode}

response {}




Database design

Event

{

id int

desc char(100)

ticketTotal int

ticketNum int

tickePrice int

updateTime timestamp

}

Order

{

id int

orderId int

detail int

status int

createTime timestamp

updateTime timestamp

}

User

{

id int

name char(20)

profile char(512)

Email char(64)

Phone int

paymentInfo char(512)

}

Stock

{

id int

eventId int

quantity int

name char(20)

updatetime timestamp

}


Order status:

0 unpaid

1 paying

2 cancelled

3 succeed

4 failed



High-level design

Event Service

Order Service

Payment Service

Notification Service






Request flows

Order flow:

0 User sends order request

1 Api gateway route request to order service, Order service start transaction.

2 Order service deducts products in stockDB. If failed, rollback.

3 Order Service creates order in DB, Commit transaction. Init status: unpaid

4 sends a delay message to Kafka, delays 10 minutes to consume.


Payment flow

0 user call api gateway

1 api gateway route requests to payment service.

2 Payment service call 3rd sdk(stripe, paypal, visa), update order status to paying

3 3rd sdk call payment service, return payment result

4 payment service receives results, sends message to Kafka


Worker flow

1 Workers consume delayed messages, and payments succeed from kafka,

2 for payment success message, update order status from 'paying' to succeed

for delayed message workers update order status to 'cancelled' if current status is 'unpaid' or 'failed'.

4 db Trigger notify notification service, notify users


Detailed component design

handle concurrency issue for hot event

hot event 10k tickets

1m users compete for this ticket, ticket is sold out in 10s

100k write request/s in peak

mysql single line tps < 500, we can't update it row by row


0 a cronjob load the event ticket quantities to redis cache in advance


0 user order events are sent to a kafka cluster

1 order services consume kafka event3, check memory cache, whether eventId = true

2 order service try decr in redis, if failed, set eventID = false in memory cache

3 if redis decr succeed, send a event to decr_event for stock DB to consume 

4 generate order, insert into order db, set status to unpaid



Trade offs/Tech choices

kafka: shave peak traffic, avoid crash servers.

redis: memory cache, for fast update.

distributed transcation: avoid in consistency




Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.






Future improvements

Monitoring

Monitor 3rd party rpc success rate

Counters on order create, order succeed, order failed, order canceled