System requirements
Functional:
- User can search for events with filter.
- User can book events and receive tickets.
- User can cancel event reservation, or resale the tickets.
Non-Functional:
- We need to ensure no double booking - transactional and idempotent
- Latency for search should be minimized with 250ms.
- Data integrity and security when purchase is involved. However, we assume Payment service provider (PSP) is given.
Capacity estimation
- Assume 100M DAU
- User browse 10 events a day, book 1 events a day. (Read heavy)
- Roughly 10K read QPS, and 1K write QPS.
API design
Start with Restful. Open to discuss rpc if we have HTTP2+ is supported.
Have user info in header as JWT to ensure request integrity.
With TicketMaster, or common site, we do 2 phases commit where you can attempt to book (reserve) for 5 mins or so. Then you make the real purchase commit.
POST /v1/attempt
{
"eventId"
}
POST /v1/commit
Header
{
"idempotentKey"
}
Body
{
"eventId"
}
GET /v1/events?{searchTerm,zipCode,price...filter}
GET /v1/booking/{bookingId}
PUT /v1/booking/{boolingId}
{
"updateInfo"
}
DELETE /v1/booking
POST /v1/postTicket
{
"originalTicketId"
"resalePrice"
}
Database design
Assume relational with PostgreSQL. Open to discussion when we confirm a working basic solution and analyze the read & write access pattern.
Customer
=======
customer_id PK
Event
=====
event_id pk
Booking
======
booking_id PK (ConfirmationId)
idempotent_key Unique
customer_id FK
Seat
======
seat_id PK
booking_id FK
state OPEN | HOLD | PENDING | SOLD - hold for 5 mins, pending is for waiting payment, sold indicates the seat is not available.
price
High-level design
You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
Trade offs/Tech choices
_For low latency search, especially with fuzzy search in search term, what can you do to improve the system?_
> Consider Elasticsearch for search query, and use Change Data Capture (CDC) (low cost - reading replication slot, build Kafka higher cost - Debezium) from Postgres to achieve data sync.
_Assume we have super start event, with lots of write traffic, what can we do?_
> Have a queue like Kafka for ticket commit, and have FE to hold Sever Side Event (SSE) with backend. When the Kafka consumer is done charging, we remove the loading and render complete booking information like confirmationId.
Failure scenarios/bottlenecks
_How to ensure database scaling for read?_
> This is a read heavy system, we can use read replica to scale.
_How to ensure one seat is only be held by one customer in scaling env?_
> On /attempt call. Use Postgres FOR UPDATE SKIPPED LOCK combination to ensure we assign the seat to only one user. On the extreme load for super star event with huge read and write (attempt), we can introduce extra component Redis to have key ticketId value customerId to represent ticket hold with TTL 5 mins. This further reduces I/O and Redis can guarantee atomic operation with LUA script.
_How to ensure no double booking?_
In commit API, we need to ensure an idempotentKey is passed in and we store to db at the same time calling for PSP.
Future improvements
_Scaling on DB_
> Since this is read heavy system, we can introduce read replica and Redis as read cache and static content in CDN at edge location in front of application for super star event especially on event venue screen. (Given event venue will have most view, no write) We have more write at attempt page, then least in booking confirmation page.
> The Postgres tables, after query pattern analysis, can be sharded horizontally by booking key or seat key with consistent hashing as these two are often queried with primary key. We have HAProxy + PGPool and we can have custom header key like X-HASH-KEY being searched queried key to pick RDS cluster endpoint.
_Scaling on Multi region & Security_
> When it comes to payment, we need to ensure security is held at high bar. Assume our scaling of DB is managed by read replica in Postgres and we have pgbouncer and (pgpool + haproxy).
We scale the microservices with K8s HPA settings. Furthermore, to ensure data transfer between microservices, especially around payment service, communication are all encrypted, we can consider Envoy side car with Istio to ensure pod to pod communication has mutual TLS.