My Solution for Design Ticketmaster with Score: 9/10
by kaleidoscope_echo243
System requirements
Functional:
- Use should be able to see nearby Movie theaters.
- User should be able to see movies list in each theater.
- User should be able to select movie and seat for the ticket.
- The ticket should hold for the user for 5 ~ 10 minutes to complete payment.
- User should be able to complete the payment.
- Users should receive a confirmation of their booking via email or SMS.
Non-Functional:
- Availability
- Scalability
- Strong consistency for movie ticket counts.
- mid level Latency. Checking ticket availability and completing payment could be allowed around 10 second delay.
Capacity estimation
Assume the system has 10m DAU.
Read QPS: 10m / 100k = 100
The peak read QPS will be twice the traffic: 2 * 100 = 200
Assume 1% user buy one ticket per day.
Write QPS: 10m * 1% / 100k = 1
The peak write QPS will be twice the traffic: 2 * 1 = 2
So this is a read heavy system.
API design
GET getMovieTheaters: The request will be user's locationInfo. The response will be a list of movie theater info.
GET getMoviesListForTheater: The request will be movie theater id. The response will be a list of movies. Each movie information will contain the available time for the movie.
GET getSeatsForMovie: The request will be theater id, movie id and date time info. The response will be a list of seatInfo. The seat info contains the seat position and whether the seat is occupied or not.
POST holdTicket: The request will be the theaterId, movieId, time of the movie and seatId. The response will be the ticket id and the status of the ticket.
GET getTicketStatus: The request will be ticketId. The response will be the status of the ticket.
POST completePaymentForTicket: The request will be ticketId and paymentInfo (credit card info). The response will be whether the payment succeed or not.
For each API request, there will be auth token attached to the request for authentication purpose.
The responses of each API may contain error. The error will have http status code, for example 4xx indicates request error from client side and 5xx indicates response error from server side. Also the error will contain specific error messages to display to the user.
Database design
I will use SQL database
Theater Table:
- theaterId (primary key)
- theaterInfo
- movieIds (reference key to Movie table)
Movie Table:
- movieId (primary key)
- movieInfo
MovieTime Table:
- movieTimeId (primary key)
- theaterId (reference key to Theater table. This field should be indexed)
- movieId (reference key to Movie table. This field should be indexed)
- movieTime (timestamp)
Ticket Table:
- ticketId (primary key)
- movieId (reference key to Movie table. This field should be indexed)
- movieTimeId (reference key to MovieTime table. This field should be indexed)
- seatId (reference key to Seat table. This field should be indexed)
- status (enum value: pending, complete, paymentFail, refund)
High-level design
Client: The client customer used to book a ticket. It could be a web app or mobile app.
Load Balance: Balance the traffic from client evenly to different servers. The load could be balanced with path based, round robin or consistent caching approaches.
Api Gateway: Api Gateway will be responsible with authentication, rate limiting and others.
CDN: Store static files, for example movie preview images, movie tailor and theater images. When user request static data, the request will be routed to the CDN which physically close to the client.
Info Service: Info service will be responsible to read request from clients. For example, get theater info, get movie info and get seat info.
Ticket Service: Ticket service will be responsible to update the database to hold a ticket and query a ticket status.
Task Scheduler: The task scheduler will be responsible to schedule a time after user hold a ticket. If the user didn't pay the ticket in a certain time (5 min or 10 min), the task scheduler will be responsible to update the status of the ticket.
SQL Database: The database will be responsible to store the information of the theater, movie, seat and ticket.
Database Cache: Will cache the SQL Database to read data faster.
Notification Service: Notification service will be responsible to send ticket confirmation notification to the client through SMS, email or push notifications.
Request flows
- Client request to view theater, movie and movie time and seat info.
- The request will be routed by the load balance to the next available server.
- The request will go through api gateway. The api gateway will do authentication and rate limiting.
- The info service server will get the request and query Database Cache.
- The results of the request will be sent back to the client.
- The client select theater, movie, movie time and seat, then send the request to ticket service to hold the ticket.
- The ticket service check the sql database to see whether the ticket is available or not.
- If the ticket is not available, then the ticket service will send the error message back to the client.
- If the ticket is available, a new entry will be added to the Ticket Table in the sql database. The status of the new ticket entry will be pending.
- The ticket service will trigger a task in the Task Scheduler. The task will wait for a certain time, 5 min or 10 min, waiting for the client to complete the payment. If the client didn't complete the payment within the time period, the Task Scheduler will trigger the task to update the SQL database ticket table. Update the relative entry status to fail.
- Within the time period, the client fill in payment information and send the payment information to the payment service.
- The payment service valid the payment information. If the payment succeed, the payment service will update the relative entry of the SQL database Ticket table. Update the status to payment complete. The payment service will also update the Task Schedule to cancel the task to fail the ticket status.
- If the payment service validate the payment info and the payment info failed, the payment service will send back the error message to the client. The client will ask user to try again for the payment information.
- The ticket service and notification service will send updated information about the ticket to Notification service. The notification service will notify the customer through SMS, email or push notifications.
Detailed component design
Dive deep into the SQL Database.
The most important part for the Database is to make sure there is no double booking.
To make sure double booking won't happen, there are two locking mechanisms to make sure two requests won't update the same database row at the same time.
- Pessimistic locking. For the pessimistic locking, when one entry is trying to be updated, the database entry will be locked first. After the entry updated or the update failed, the lock will be released. Then the same entry could be updated by other request. Pessimistic locking is simpler to implement. But it has a higher latency because each the entry update needs to lock the entry first.
- Optimistic locking. For optimistic locking, the database entry won't be locked for update. If two requests try to update the same entry at the same time, the first request will update the entry and update the version number of the entry, then the second request notice the version of the entry has already update, then it will return error message and won't update the entry. The optimistic locking is faster and low latency. But if the request failed, the ticket service will be response to notice that the database entry cannot be updated and reverse to the previous state. The implementation details are more complex.
According to estimated write QPS, which is 2 for peak hours, which is low, I will choose Pessimistic locking. Another reason is that the ticket booking workflow didn't need low latency. User will tolerate for couple second delays. If later the traffic increase, I could consider switch to Optimistic locking if later we have increased traffic and required low latency of the system.
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
Failure scenarios/bottlenecks
Our non-functional requirements for the proposed twitter system design are scalability, fault tolerance, availability, and low latency. Let’s discuss how the proposed system fulfills these requirements:
- Scalability: The proposed system is scalable to handle an ever-increasing number of users. The required resources, including load balancers, web servers, and other relevant servers, are added/removed on demand.
- Fault tolerance: The replication of data consisting of theater, movie, seat, ticket database tables makes the system fault-tolerant. Moreover, the redundant resources are always there to handle the failure of a server or its component. Monitoring service is used to enhance reliability by continuously observing system health, detecting issues early, providing insights for optimization, and assisting in timely incident response.
- Availability: The system is highly available by providing redundant servers and replicating data on them. When a user gets disconnected due to some fault in the server, the session is re-created via a load balancer with a different server. Moreover, the data (users metadata, posts, and newsfeeds) is stored on different and redundant database clusters, which provides high availability and durability.
- Low latency: We can minimize the system’s latency at various levels by:
- Geographically distributed servers and the cache associated with them. This way, we bring the service close to users.
- Using CDNs for frequently accessed media content.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?