System requirements


Functional:

  • Customer can buy ticket
  • Customer can view an event
  • Customer can search for events



Non-Functional:

  • Consistency in the buying process
  • Low latency Search
  • High Availability for search, eventual consistency accepted


I will not cover the following topics ass out of scope

  • Monitoring
  • Data private and security
  • Authentication
  • Dealing with images and concert arena diagrams





Capacity estimation

Users: 100 Million

DAU: 10 Million

QPS: 10,000 QPS




API design

All end points use jwt token for user auth

Search for tickets

GET api/search?terms={}

Get event

GET api/event/eventID

Reserve ticket

POST api/ticket/reserve

Buy ticket

POST api/ticket/confirm





Database design

Event

  • id
  • location
  • name
  • date

User

  • id
  • name
  • password
  • email


Ticket

  • id
  • event_id
  • price
  • purchased
  • payment_id




High-level design







Request flows

when a user makes a request, the user/customer searches for an event. This request got to the event service which goes through a full-text search engine like Elastic search. The result here is a list of events that relate to the search term.

The user then selects the event id in the result set for the event they are interested in.

Once this event is selected, you have the option to reserve the ticket, once reserving, a lock is generated on the ticket. This means its now excluded from the search result set when searching for results in the initial search flow.

You will be given a certain amount of time to purchase this ticket. If you do not purchase it in the allocated time, the lock will be released and it will be visible in the search results. Once the decision is made to purchase the ticket, it will be confirmed by the purchase service which will go through the payment service to make the purchase.

Once the purchase is complete, the ticket will be marked off as purchased and no longer visible.

For a popular show like Taylor Swift, to prevent the the service from being over loaded with a thundering herd, a queue can be put in place where the customer is notified that they are in a queue and have to wait for the Event service to mange the reservation process.




Detailed component design

Ticket Search:

Here we are using full text search because it can handle high reads. Elastic search is highly available and is partition tolerant. Works well even if some nodes go down. it however provides eventual consistency by default which will not be a problem as events are not frequently added. It also adds caching though at the expense of some memory to deal with popular queries. It can handle millions of requests per second which is more than enough for our requirements. The Elastic Search store will be populated every time an administrator adds an event

Redis for Distributed Lock:

Redis is a very popular option for distributed locks. It has very fast reads and writes as it is in memory. So a Redis cluster can store a ticket id for the purpose of reserving in a fast time and have a TTL for when to release the lock and make it viewable again

RDBMS table:

Here we shall use an RDBMS because we want the ACID nature of these transactions. As we scale, we can introduce techniques like Master-Replica, if this still can satisfy, we can use master-replica with sharding. this ensures that one shard is for writing and another reading. This type of horizontal scaling is able to distribute the transactions per second amongst different shards and handle the load sent.


Trade offs/Tech choices

Full Text Search:

  • If tuned for High Availablity, the consistency will be low. Thgis will not have such a bad impact on system as events are not always added
  • In the event a popular artist like Taylor swift has an event, the service will experience degraded performance due to high reads. Query Caching and adding replicas could mitigate this. But his will come at the expense of some memory and the cost of replicas

RDBMS:

  • In the event we have a hot shard due to Taylor Swift concerts data being in the same shard, we can mitigate this be mitigating by shard rebalancing and shard splitting. We could also use a NewSQL table like SingleStore which does this for you automatically, only issue is the licensing fee



Failure scenarios/bottlenecks

  • The RDMS can be a bottleneck in the event of a popular event leading to massive writes.
  • In the event there is a major failure with the Redis clusters, there will be a small window where the locking mechanism fails. This will degrade the user experience but because the RDBMS is ACID, the purchase process will still be consistent




Future improvements

  • To deal with the RDBMS bottle neck, we can come up with a sharding strategy to spread the writes in a master replica strategy. In the event of a masters failure, a replica is promoted to handle writes