Design Ticketmaster - System Design

Requirements

Functional Requirements:

Users can see available shows.
Users can view a seating map to pick seats.

Non-Functional Requirements:

- The data should be focused on consistency.
- Each data, like the tickets, needs to be unique to ensure no duplicate tickets.
- The latency must be low.
- There's heavy amount of reads over writes, especially because there will be people refreshing a lot of times right for the buy ticket button to be available.
- The system should be scalable, especially on the read.

Capacity Estimation

Assume 100 events per month (let's say per region.) Then assume 500 tickets per event. let's assume approximately there are 10000 people who wants each tickets. so that's about: 100 * 500 = 50k tickets per month, and with 10000 people that's interested in events, that's at least 500,000,000 people accessing the site, 500M people.

In terms of storage, let's say 500bytes for each user information (name, address, payment information, age, etc), and 1kb for each Event advertisements (1 paragraph for introduction, name, location, etc), and 200 bytes for each ticket info (event, person, seat). assuming we have 500M people, that's about 500 * 500M = 2.5B Bytes -> 2.5GB of people, and 100 * 1kb = 100kb event info, and 50k tickets * 200, so 10M bytes -> 10MB of storage, so we can just stick with cloud storage database, but not something too crazy. Now for the images, assuming 1MB of storage per image, and 3 images per events, that would be 3MB * 100 = 300MB, we would still want some blob storage for when we scale.

In terms of requests, we would have 500M people accessing the site, and skipping to if they were to try to get the ticket, For that spike, people are probably making at least 1 request / second. Assuming that, we would have 500M requests per second. This would mean we would need to scale a lot in terms of latency, and robust load balancer in order to keep latency.

People will always be swooping around for tickets, especially during the ticket time, but even in the non-busy season, we could expect people to view + search. so marking that as 2 requests, we can say that people will make at least 2 requests in a month not in a busy season, so even this would be 2 * 500M = 1B requests / month -> ~380M per second.

API Design

POST /user body: User - Create User

PUT /user body: User - Update User

POST /login body: User - User Login

POST /logout body: User - User Logout

POST /event body: Event - Create Event

PUT /event body: Event - Update Event

GET /event&filter filter being: event ID, location, artist name, etc.

GET /event/{event_id} - specific event information, event page.

POST /ticket/reserve - Reserve (shopping cart) an event

DELETE /ticket/reserve/{event_id} - remove an event from reserved position

POST /ticket/buy/{event_id} - buy the ticket for an event

GET /ticket/{event_id} - View User's tickets

POST /upload body: multipartfile - image - uploads image

PUT /upload/{image_id} - edit image

DELETE /upload/{image_id} - delete image

High-Level Design

For All User-related services, like Login, Logout, create User, etc, the api gateway is going to send the request to the UserService.

For the Event admin stuff, setting # of tickets and creating events the request is going to be sent to EventAdminSvc. Here, the image being uploaded for events will be sent to the blob storage.

For Users viewing the Events, we are going to send the request to the EventService. This EventService will be designed mainly for users reading the events, and we can expect heavy load here. Therefore, we will have a cache that stores data for the upcoming events and current events. This will make the requests a bit faster. If the cache miss, then user will be lead to the database.

For Users actually buying the ticket, The request will be first sent to a queue, of something like Amazon SQS. We are utilizing the queue to prevent spikes of requests sent to the server, as well as ensuring that the order in which the users reserve the tickets are kept. The workers for the reserveService, will pick up the requests from the queue, and send it. In order to prevent duplicate ticket reservation, we will set a distributed lock by the ticket_id and the event_id, based on the reserve request that came in first for that specific ticket. This ticket will be available if the reservation period expires, or if the user gives up the reservation. If the user is actually buying the ticket, they will be sent to the 3rd party payment portal.

For when the user is viewing the event, The user will receive image from the CDN instead of gathering it from the database. This will ensure faster image load on user's screen.

Database Design

For the database, we will utilize PostGreSQL. One, the actual data in the database is quite uniform. We may have some more fields on the Events section, based on user preference, but the outcasts can be turned into JSON in the PostGreSql field.

There's going to be a lot of search requests as well, so we want to utilize Elasticsearch here to help out with filtering the events. We can base the index on the event_id and the location or by the artist, and based on the filter that user provides.

For the fallback for database, we could default back to PostGreSQL.

We'll have User table, Event table, Transaction Table, Reservation Table.

The notable ones would be

Reservation Table:

reserve_id

event_id

user_id

TTL

index(reserved_id,event_id)

Event table

Event Table:

event_id

artist(user_id)

start_time

end_time

location_id

Detailed Component Design

So from the earlier capacity estimation, the biggest bottlenecks were

Search and the reservations.

Generally for rate-limiting, It would be a better idea to have a sliding window. For normal searches, we could give them higher number, while for reservation, we would lower the rate, or have some token bucket to ensure correct requests gets in, along with idempotent ones.

Having too much searches, we can alleviate the time / latency of the search by utilizing elasticSearch (vector searching), as well as caching.

For the Reservation,

We will have the Queue to ensure that all of the requests come in order. As I mentioned earlier, for users wanting a ticket for a specific seat, The seat will be by who asked for it first, so whoever's request comes through the queue first. The reason why we may utilize Amazon SQS is mainly for its decoupling / buffer. In spikes, we are going to be having a lot of requests in the Queue.

For the requests that timeout or fail, we may choose to re-queue the request, this may be due to session timeout, or other reasons.

Generally for scaling, I would be autoscaling the read EventService and the ReserveService. Whenever there's a heavy sale time, we would base the autoscaling based on the % of requests in the Queue. We would be increasing the # of workers for the reserveService.

In order to prevent the same seat being sold, Distributed lock should be used, as sometimes people will reserve by the type of the seat, not necessarily the seat number, so whenever a user buys or reserves a seat, we should have a distributed lock based on the event_id and the seat_id.

Should the 3rd party payment fails while the customer is reserving, they will still be able to try the payment until the reservation expires.