System requirements
Functional:
List functional requirements for the system (Ask the chat bot for hints if stuck.)...
- The platform should help me to select movies
- It should list all the theatre which is running the movie along with showtime
- On selection of movies I should be able to select seat.
- The booked seat should be greyed out
- On confirmation I should be able to make payment.
- It should notify me with all the details
Non-Functional:
List non-functional requirements for the system...
- The system should be available so it can serve millions of users at the same time
- It should be scalable on increasing customer query it should not crash
- Post seat selection the system should be strong consistent so that same seat is not assigned to multiple users.
- It should have low latency to get the data about movie info but for payment latency can be compromised.
Capacity estimation
Estimate the scale of the system you are going to design...
100 Million Daily active users
Read QPS = 100M/100k = 1000
During peak hours = 2 * QPS = 2000
Write will be 1% of users who buys ticket
write QPS = 10
During peak hours = 20
So it is a read heavy system
API design
Define what APIs are expected from the system...
GET /v1/theatres/:locationId
Retrieve list of theatres for this location
Response:
data: [
{
theatreId:
address:{
}
email:
phone:
}
]
GET /v1/movies/:theatreId
We need to provide theatreId and it will return all the movies running in this theatre
Response:
data: [
{
movieId:
movieName:
timeslot:
}
]
GET /theaters/{theaterId}/movies/{movieId}/seats?datetime={dateTime}
This will give the information about seats for a specific theatre and for a specific movie running in that theatre
Response:
{
"theaterId": 123,
"movieId": 456,
"dateTime": "2024-09-25T18:30:00",
"seats": [
{
"position": "A1",
"occupied": true
},
{
"position": "A2",
"occupied": false
},
{
"position": "A3",
"occupied": true
},
{
"position": "B1",
"occupied": false
}
]
}
POST /v1/seats/hold
This request will to hold the seat for the specific movie
Request params
POST /tickets/hold
Content-Type: application/json
{
"theaterId": "123",
"movieId": "456",
"datetime": "2024-09-25T19:30:00",
"seatId": "A1"
}
It will return response based on the availability
200 OK:
{
ticketId:
Status: held
}
400 Bad Request:
If the theatreId, movieId is invalid
500 Internal server error
If not able to hold the seat please try again
GET /ticket/:ticketId/status
It will return the status of the ticket
POST /v1/payment
It will be available to make payment
Request Body:
{
ticketId:
info:{
creditcard no:
expiry date:
cvv no:
}
Response:
200 OK if the payment successful
Database design
Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...
Theatre
- TheatreID
- Location
- City
- List<Show> shows
Show
- ShowId
- MovieId
- TheatreId
- Date showStartTime
- List<Seat> bookedSeats
Seat
- SeatId
- Row
- SeatCategory
Movie
- MovieId
- Name
- Title
- Duration
Bookings
- BookingId
- userId
- createdTime
- status : enum : held, booked, completed, expired
- showId
- List<Seat> bookedSeats
Show_Seats
- showId
- seatId
- status
- timestamp
Database Choices
- Relational Database for User, Booking, and Payment Data:
- Reasoning: User data and booking , show_seat information require strong consistency and relational structure to maintain data integrity and enforce ACID properties.
- Database Type: Relational Database (e.g., PostgreSQL, MySQL).
- CAP Theorem Focus: Consistency is prioritized to ensure accurate and reliable user information and booking details.
- NoSQL Database for Event, Location, Reviews, and Comments Data:
- Reasoning: Event details and location information may vary and evolve, requiring a flexible schema and scalability, making NoSQL databases suitable.
- Database Type: NoSQL Database (e.g., MongoDB, Cassandra).
- CAP Theorem Focus: Partition tolerance and availability are crucial to handle potential changes in data structure and support scalable operations.
- Document Store for Search Data:
- Reasoning: Search data benefits from a document-oriented storage approach, enabling fast and efficient retrieval of movie and event-related information.
- Database Type: Document Store (e.g., Elasticsearch).
- CAP Theorem Focus: Availability is emphasized for quick and responsive search capabilities.
- Media Database for Media Files:
- Reasoning: Multimedia files, such as movie trailers, require efficient storage and retrieval, making media databases suitable for this purpose.
- Database Type: Media Database (e.g., Amazon S3, MongoDB with GridFS).
- CAP Theorem Focus: Emphasis on Availability and Partition Tolerance to ensure media files are accessible and retrievable.
- Key-Value Store for Metadata and Quick Lookups of popular events:
- Reasoning: Metadata and quick lookups can benefit from a key-value store for simplicity and fast access to specific information.
- Database Type: Key-Value Store (e.g., Redis).
- CAP Theorem Focus: Emphasis on Availability and quick lookups, while still maintaining Partition Tolerance.
High-level design
You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...
Client: The client customer used to book a ticket. It could be a web app or mobile app.
Load Balance: Balance the traffic from client evenly to different servers. The load could be balanced with path based, round robin or consistent caching approaches.
Api Gateway: Api Gateway will be responsible with authentication, rate limiting and others.
CDN: Store static files, for example movie preview images, movie tailor and theater images. When user request static data, the request will be routed to the CDN which physically close to the client.
Info Service: Info service will be responsible to read request from clients. For example, get theater info, get movie info and get seat info.
Ticket Service: Ticket service will be responsible to update the database to hold a ticket and query a ticket status.
Task Scheduler: The task scheduler will be responsible to schedule a time after user hold a ticket. If the user didn't pay the ticket in a certain time (5 min or 10 min), the task scheduler will be responsible to update the status of the ticket.
SQL Database: The database will be responsible to store the information of the theater, movie, seat and ticket.
Database Cache: Will cache the SQL Database to read data faster.
Notification Service: Notification service will be responsible to send ticket confirmation notification to the client through SMS, email or push notifications.
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
- Client request to view theater, movie and movie time and seat info.
- The request will be routed by the load balance to the next available server.
- The request will go through api gateway. The api gateway will do authentication and rate limiting.
- The info service server will get the request and query Database Cache.
- The results of the request will be sent back to the client.
- The client select theater, movie, movie time and seat, then send the request to ticket service to hold the ticket.
- The ticket service check the sql database to see whether the ticket is available or not.
- If the ticket is not available, then the ticket service will send the error message back to the client.
- If the ticket is available, a new entry will be added to the Ticket Table in the sql database. The status of the new ticket entry will be pending.
- The ticket service will trigger a task in the Task Scheduler. The task will wait for a certain time, 5 min or 10 min, waiting for the client to complete the payment. If the client didn't complete the payment within the time period, the Task Scheduler will trigger the task to update the SQL database ticket table. Update the relative entry status to fail.
- Within the time period, the client fill in payment information and send the payment information to the payment service.
- The payment service valid the payment information. If the payment succeed, the payment service will update the relative entry of the SQL database Ticket table. Update the status to payment complete. The payment service will also update the Task Schedule to cancel the task to fail the ticket status.
- If the payment service validate the payment info and the payment info failed, the payment service will send back the error message to the client. The client will ask user to try again for the payment information.
- The ticket service and notification service will send updated information about the ticket to Notification service. The notification service will notify the customer through SMS, email or push notifications.
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
Let's talk about database and concurrency. These three operations must either succeed as a whole, or fail as a whole.
- User selects seats for bookings: seats(user_ID, show_ID, array of seat IDs)
- User books. book(user_ID, booking_ID).
- After paying using an external provider, user finalizes booking: finalize_booking(user_ID, booking_ID)
In other words, if anything fails within (1) - (3), the process must be rolled back as if nothing happened. For example:
a. User A selects seat 1.
b. User A books seat 1.
c. User A fails to pay because they forgot their PayPal password.
In this case, after a timeout (e.g. 10 - 15 min), seat 1 must be made available again, so that other users can book them.
We also have to do this with concurrency in mind. When multiple users try to book the same seats on a same show, the system must reject some of the requests, gracefully.
When select_seats() is called, we create rows in Show_Seats table to signify that the seats for this show are held for N minutes. We can express this by setting the rows' status to "held", with a timestamp.
We will use uniqueness constraints on Show_ID and Seat_ID on this table to provide consistency.
This is an example of SQL statement:
insert into Show_Seats table (Show_ID, Seat_ID, status, timestamp)
values (100, 10, held, 4-22-2024 12:00:00),
(100, 11, held, 4-22-2024 12:00:00),
(100, 12, held, 4-22-2024 12:00:00)
This operation would fail and be rolled back, for example, if another client successfully booked seats 12 on the same show. The important thing is that this insert statement inserts all the seats in the booking at once. This way, if the operation fails, it would be rolled back for all the seats.
Going to book() stage, we need to update the status from "held" to "booked".
begin transaction
select status, timestamp from Bookings where booking_ID = 200 and status = 'held' for update
update Bookings set status='booked', timestamp = '4-22-2024 12:00:02' where booking_ID = 200
end transaction
This is using pessimistic locking on the row. This ensures only one thread succeeds to change the status of this row. We assume performance would be acceptable because it will only lock these rows. At the booking or finalization stage, only one client would be accessing these rows, so we think the scalability aspect would be OK.
Note that, although we are using RDB's powerful transaction functionality, we are NOT putting the whole process (select_seats(), book(), finalize_booking()) in one transaction. Doing so would have significant performance and scalability impact, as it would lock rows in DB for minutes. DB transactions are meant to last for a very short amount of time (milliseconds).
Instead, we are using a transaction in one operation (SQL book() and finalize_booking()), which would finish very quickly.
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
Dive deep into the SQL Database.
The most important part for the Database is to make sure there is no double booking.
To make sure double booking won't happen, there are two locking mechanisms to make sure two requests won't update the same database row at the same time.
- Pessimistic locking. For the pessimistic locking, when one entry is trying to be updated, the database entry will be locked first. After the entry updated or the update failed, the lock will be released. Then the same entry could be updated by other request. Pessimistic locking is simpler to implement. But it has a higher latency because each the entry update needs to lock the entry first.
- Optimistic locking. For optimistic locking, the database entry won't be locked for update. If two requests try to update the same entry at the same time, the first request will update the entry and update the version number of the entry, then the second request notice the version of the entry has already update, then it will return error message and won't update the entry. The optimistic locking is faster and low latency. But if the request failed, the ticket service will be response to notice that the database entry cannot be updated and reverse to the previous state. The implementation details are more complex.
According to estimated write QPS, which is 2 for peak hours, which is low, I will choose Pessimistic locking. Another reason is that the ticket booking workflow didn't need low latency. User will tolerate for couple second delays. If later the traffic increase, I could consider switch to Optimistic locking if later we have increased traffic and required low latency of the system.
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?