System requirements
Functional:
- Customer can buy ticket
- Customer can view an event
- Customer can search for events
Non-Functional:
- Consistency in the buying process
- Low latency Search
- High Availability for search, eventual consistency accepted
I will not cover the following topics ass out of scope
- Monitoring
- Data private and security
- Authentication
- Dealing with images and concert arena diagrams
Capacity estimation
Users: 100 Million
DAU: 10 Million
QPS: 10,000 QPS
API design
All end points use jwt token for user auth
Search for tickets
GET api/search?terms={}
Get event
GET api/event/eventID
Reserve ticket
POST api/ticket/reserve
Buy ticket
POST api/ticket/confirm
Database design
Event
- id
- location
- name
- date
User
- id
- name
- password
Ticket
- id
- event_id
- price
- purchased
- payment_id
High-level design
Request flows
when a user makes a request, the user/customer searches for an event. This request got to the event service which goes through a full-text search engine like Elastic search. The result here is a list of events that relate to the search term.
The user then selects the event id in the result set for the event they are interested in.
Once this event is selected, you have the option to reserve the ticket, once reserving, a lock is generated on the ticket. This means its now excluded from the search result set when searching for results in the initial search flow.
You will be given a certain amount of time to purchase this ticket. If you do not purchase it in the allocated time, the lock will be released and it will be visible in the search results. Once the decision is made to purchase the ticket, it will be confirmed by the purchase service which will go through the payment service to make the purchase.
Once the purchase is complete, the ticket will be marked off as purchased and no longer visible.
For a popular show like Taylor Swift, to prevent the the service from being over loaded with a thundering herd, a queue can be put in place where the customer is notified that they are in a queue and have to wait for the Event service to mange the reservation process.
Detailed component design
Ticket Search:
Here we are using full text search because it can handle high reads. Elastic search is highly available and is partition tolerant. Works well even if some nodes go down. it however provides eventual consistency by default which will not be a problem as events are not frequently added. It also adds caching though at the expense of some memory to deal with popular queries. It can handle millions of requests per second which is more than enough for our requirements. The Elastic Search store will be populated every time an administrator adds an event
Redis for Distributed Lock:
Redis is a very popular option for distributed locks. It has very fast reads and writes as it is in memory. So a Redis cluster can store a ticket id for the purpose of reserving in a fast time and have a TTL for when to release the lock and make it viewable again
RDBMS table:
Here we shall use an RDBMS because we want the ACID nature of these transactions. As we scale, we can introduce techniques like Master-Replica, if this still can satisfy, we can use master-replica with sharding. this ensures that one shard is for writing and another reading. This type of horizontal scaling is able to distribute the transactions per second amongst different shards and handle the load sent.
Trade offs/Tech choices
Full Text Search:
- If tuned for High Availablity, the consistency will be low. Thgis will not have such a bad impact on system as events are not always added
- In the event a popular artist like Taylor swift has an event, the service will experience degraded performance due to high reads. Query Caching and adding replicas could mitigate this. But his will come at the expense of some memory and the cost of replicas
RDBMS:
- In the event we have a hot shard due to Taylor Swift concerts data being in the same shard, we can mitigate this be mitigating by shard rebalancing and shard splitting. We could also use a NewSQL table like SingleStore which does this for you automatically, only issue is the licensing fee
Failure scenarios/bottlenecks
- The RDMS can be a bottleneck in the event of a popular event leading to massive writes.
- In the event there is a major failure with the Redis clusters, there will be a small window where the locking mechanism fails. This will degrade the user experience but because the RDBMS is ACID, the purchase process will still be consistent
Future improvements
- To deal with the RDBMS bottle neck, we can come up with a sharding strategy to spread the writes in a master replica strategy. In the event of a masters failure, a replica is promoted to handle writes