Design Ticketmaster - System Design

System requirements

Functional:

Customer can buy ticket
Customer can view an event
Customer can search for events

Non-Functional:

Consistency in the buying process
Low latency Search
High Availability for search, eventual consistency accepted

I will not cover the following topics ass out of scope

Monitoring
Data private and security
Authentication
Dealing with images and concert arena diagrams

Capacity estimation

Users: 100 Million

DAU: 10 Million

QPS: 10,000 QPS

API design

All end points use jwt token for user auth

Search for tickets

GET api/search?terms={}

Get event

GET api/event/eventID

Reserve ticket

POST api/ticket/reserve

Buy ticket

POST api/ticket/confirm

Database design

Event

id
location
name
date

User

id
name
password
email

Ticket

id
event_id
price
purchased
payment_id

High-level design

Request flows

when a user makes a request, the user/customer searches for an event. This request got to the event service which goes through a full-text search engine like Elastic search. The result here is a list of events that relate to the search term.

The user then selects the event id in the result set for the event they are interested in.

Once this event is selected, you have the option to reserve the ticket, once reserving, a lock is generated on the ticket. This means its now excluded from the search result set when searching for results in the initial search flow.

You will be given a certain amount of time to purchase this ticket. If you do not purchase it in the allocated time, the lock will be released and it will be visible in the search results. Once the decision is made to purchase the ticket, it will be confirmed by the purchase service which will go through the payment service to make the purchase.

Once the purchase is complete, the ticket will be marked off as purchased and no longer visible.

For a popular show like Taylor Swift, to prevent the the service from being over loaded with a thundering herd, a queue can be put in place where the customer is notified that they are in a queue and have to wait for the Event service to mange the reservation process.

Detailed component design

Ticket Search:

Here we are using full text search because it can handle high reads. Elastic search is highly available and is partition tolerant. Works well even if some nodes go down. it however provides eventual consistency by default which will not be a problem as events are not frequently added. It also adds caching though at the expense of some memory to deal with popular queries. It can handle millions of requests per second which is more than enough for our requirements. The Elastic Search store will be populated every time an administrator adds an event

Redis for Distributed Lock:

Redis is a very popular option for distributed locks. It has very fast reads and writes as it is in memory. So a Redis cluster can store a ticket id for the purpose of reserving in a fast time and have a TTL for when to release the lock and make it viewable again

RDBMS table:

Here we shall use an RDBMS because we want the ACID nature of these transactions. As we scale, we can introduce techniques like Master-Replica, if this still can satisfy, we can use master-replica with sharding. this ensures that one shard is for writing and another reading. This type of horizontal scaling is able to distribute the transactions per second amongst different shards and handle the load sent.

Trade offs/Tech choices

Full Text Search:

If tuned for High Availablity, the consistency will be low. Thgis will not have such a bad impact on system as events are not always added
In the event a popular artist like Taylor swift has an event, the service will experience degraded performance due to high reads. Query Caching and adding replicas could mitigate this. But his will come at the expense of some memory and the cost of replicas

RDBMS:

In the event we have a hot shard due to Taylor Swift concerts data being in the same shard, we can mitigate this be mitigating by shard rebalancing and shard splitting. We could also use a NewSQL table like SingleStore which does this for you automatically, only issue is the licensing fee

Failure scenarios/bottlenecks

The RDMS can be a bottleneck in the event of a popular event leading to massive writes.
In the event there is a major failure with the Redis clusters, there will be a small window where the locking mechanism fails. This will degrade the user experience but because the RDBMS is ACID, the purchase process will still be consistent

Future improvements

To deal with the RDBMS bottle neck, we can come up with a sharding strategy to spread the writes in a master replica strategy. In the event of a masters failure, a replica is promoted to handle writes