System requirements


Functional:

+ Be able to buy tickets for a movie. We should try to be as fair as possible. Users can only buy a limited amount of tickets on every purchase. Users can lock tickets once they are in the system.

Consistency and availability are the key requirements, and latency is a nice to have.



Non-Functional:

+ Authentication

+ Billing

+ Abuse

+ Logging





Capacity estimation

assuming we have 1 million movies, we would need 2.4KB * 1M = 24 000 000 (24MB)


assuming for every movie we have 1K tickets, we would have 1 billion tickets, 1B * 18 = 18 000 000 000 (18 GB)




API design

+ /api/v1/movies/ [GET]

+ /api/v1/movie/ [GET]

+ /api/v1/movie//reserve [POST]

+ /api/v1/movie//tickets [POST]




Database design

Table Movies

MovieId (4 bytes)

Title (100 chars) 400 bytes

Description 500 chars 2KB

CreatedDate 4 bytes

LastUpdatedTime 4 bytes

AvailableSeats 2 bytes

CreatedUser 4 bytes

LastUpdatedUser 4 bytes

total: 2,422 bytes


Table Tickets

TicketId 4 bytes

MovieId 4 bytes

Seat Number 2 bytes

CreatedUser 4 bytes

LastUpdatedUser 4 bytes

total: 18 bytes


High-level design

We will use microservices that read from the DB: reservation service and movies service.


the orchestration server sends requests to the right micro service, depending on the type of request (fetching movies or making reservations)


the cache uses a TTL eviction strategy.


whenever users come to our product, if it's a movie request, we will use the movies microservice to fetch information about the movie.


if it's a very popular movie, we will have an additional cache with a LRU eviction policy to avoid hotspots.


our partition strategy for the movies will be based on movieid, we will generate movieid based on uuid (random). we will mitigate potential hotspots with a cache.


once users reach the movie details page, we will give them the option to make a reservation, this will redirect them to a page that asks them how many and which tickets they want to reserve. once users select the tickets, we will route this request to the reservation service.


the reservation service will fetch the reservation cache to see if the seats are currently being reserved (race condition). if they're not reserved, it will insert them there with a timestamp. the system will also update the database to reflect that the seats are taken.


every ticket is created beforehand as a row. users create a temporary lock on the row.


since we can expect a large sudden spike in the system and users using multiple devices to get access to tickets, we shouldn't rely on using a consistent hashing for our servers load distributors. we should instead, use a random load distributor.


if anything unexpected (server dies, request dies, etc... )goes wrong during a request of any user journey, the user will be shown an error message.


if the user is trying to abuse the system by submitting tickets for a movie that doesn't exist, we would create an alert in the abuse service such that the user can't use the system for a certain time.


for payment service, we will redirect users to something like stripe.



Request flows


1) user comes into the movie listings page

2) user selects a movie

3) user selects seats for a movie

4) request is redirected to reservation service

5) reservation service queries cache - (seats available?)

6) if yes, then insert them into the cache and mark them as not available

7) reservation service updates the db to mark them as taken

8) user is taken to the payment option. if user finished the purchase, the database is updated. if user doesn't finish the purchase, the cache releases the available tickets.



Detailed component design

on the database side, we're using a relational db since we need to keep a relationship between movies and their seats. some options could be mysql or spanner.


on the cache side, we could use something like redis or memcache. we need strong consistency to make sure we don't oversell a ticket. i would prefer redis since i think it offers everything memcache does + more.



Trade offs/Tech choices

i chose to add an orchestration service in between at the expense of latency. this allows to have a centralized endpoint which we can use to ensure security and uniformity.




Failure scenarios/bottlenecks

it can happen that we run into race conditions while 2 users try to book the same seat. the reservation service will query the cache to make sure no reservation is double booked.



Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?