System requirements
Functional:
List functional requirements for the system (Ask the chat bot for hints if stuck.)...
- Users can browse currently available evnets and future events
- User can view seats for a selected show time in real-time
- Users can select seats for a future event and book tickets by paying for the seats
Out of Scope of MVP
- Refunding on cancelling a booking (optional, wil visit if we get time)
- Surge pricing for tickets
- User management, registration & login
- Administration - Adding events, ticket classifications, pricing
Assumtions
- This is a movie ticket booking system, where the system shows up a bunch of movies and user can book tickets for a movie
- All tickets are of same price and category
System Properties
- When a use selects a moves to payment, the seats are blocked for 5 mins
- The ticket is confirmed only on successful payment
Non-Functional:
List non-functional requirements for the system...
- Latency - Seat availability and show timings are returned within 200ms
- Reliability - Booked seats are immediately reflected for all users, eventually consistent is not acceptable
- Strong Consistency - The system prevents race conditions and ensures no two users can book the same seat
- High Availability - Ticket booking system is available 99.9% of time for all users to view shows and book tickets
CAP
- C > A - System prioritizes consistency over availability
- Double booking never happens
- Its acceptable if the system temporarily blocks the seats till payment is completed, this is for correctness
Capacity estimation
Estimate the scale of the system you are going to design...
- Total Users - 100M, Daily active users (40% of Total users) = 40M DAU
- Aerage attempt to book tickets per user - 3 tickets a week = 120M tickets a week = aprox 17M tickets a day
- Total Shows - 100 venues in a city with 4 shows per venue, across 100 cities in a country = 100 x 4 x 100 = 40K shows per day
- Total Tickets - 250 seats per venue, 40K x 250 = 40,000 x 250 = 10M tickets inventory per day
- RPS for booking - approx 115 tickets booked per second (10M / 86400)
- Adjusting Peak traffic - 60% - approx 200 tickets per second (blockbuster, retries, bots etc)
- Supply vs Demand - Concurrent request estimation
- 17M booking demand for 10M tickets inventory = aprrox 2 users attemtping to book the same seat with an additional surge on blockbuster - 20 concurrent users per seat
- Short burst estimation - A blockbuster movie running in 1K venues in the country will get us 20 x 250 = estimated 5M requests with first 5 mins of introduction = rounded to 20K RPS
- RPS
- Normal load - 200 RPS
- Spike load - 16K - 20K RPS
API design
Define what APIs are expected from the system...
Public endpoints - /cities/*, /events/*
Authorised endpoints - /bookings/*
List Cities - Get All cities in a country
GET /1/cities/:countryCode
Optional Request Header: [
"Authorization" : Bearer <OAuth token>
]
Response: 200 OK
Response Body: [A list of City instances]
List Events in a City
Optional filters - if missing, API returns all events near the user's area/city
GET /1/events?city={cityCode}&event={eventName}
Request Header: [
"Authorization" : Bearer <OAuth token> // Optional
]
Response: 200 OK
Response Body: [A list of Event instances in a City]
List a particular event and its venues, tickets available at each venue
GET /1/events/:eventId
Request Header: [
"Authorization" : Bearer <OAuth token> // Optional
]
Response: 200 OK
Response Body: [Details of an event]
Venues that an event is running in city
GET /1/events/:eventId/venues
Request Header: [
"Authorization" : Bearer <OAuth token> // Optional
]
Response: 200 OK
Response Body: [List of venues that screens the event]
List seats for an event
GET /1/events/:eventId/venues/:venueId/shows/:showId/seats/
Request Header: [
"Authorization" : Bearer <OAuth token> // Mandatory, user is logged in at this point
]
Response: 200 OK
Response Body: [A list of Seats for the event at the venue]
Book tickets for an event
POST /1/bookings
Request Header: [
"Authorization" : Bearer <OAuth token> // Mandatory, user is logged in at this point
]
Request Body: {An instance of Booking request object, contains showId, venueId, selected seats, payment session id}
Response: 201 OK, 204 Accepted waiting for payment confirmation, 409 conflict, 429 too many requests
Response Body: [An instance of Booking confirmation which includes tickets]
List historical bookings of the user
List all booking of a user, status is missing then all booking entries are returned in reverse chronological order
GET /1/bookings/:userId/status=confirmed|cancelled|pending
Request Header: [
"Authorization" : Bearer <OAuth token> // Mandatory, user is logged in at this point
]
Response: 200 OK
Response Body: [A list of Booking instances, confirmed as well as pending to confirm]
Cancel a Booking
DELETE /1/bookings/:bookingId/
Request Header: [
"Authorization" : Bearer <OAuth token> // Mandatory, user is logged in at this point
]
Request body: {An instance of confirmed booking }
Response: 201 OK
Response Body: [Booking cancellation confirmation ]
Database design
Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...
Core Entities
- Event
- eventId
- title
- description
- cast // array of cast instances
- tags // A key value pairs
- rating - enum: UA|A|N
- duration
- ...
- Venue
- venueId
- description
- location // lat/long
- cityCode
- tags // Key value pairs
- facilities // enum of facilities - A/C, Bucket Seats, Food Served etc
- capacity // Total seats, for fast lookup but could be inferred from seats
- Show
- showId
- name // FirstShow|SecondShow|....
- startTime
- endTime // = startTime + event.duration
- RunningEvent
- runningId
- eventId
- venueId
- showId
- capacity // Optional override
- date
- Seat
- seatId
- rowid
- sectionid
- venueId
- seatLabel // A01
- type - enum [Luxury|Premium|Standard]
- SeatAvailability
- availabilityId
- runningId - FK RunningEven.runningId
- seatId - FK Seat.seatId
- status - enum [Available | Booked | Blocked]
- User
- userid
- name
- location // base location or preferred location
- mobile
- Booking
- bookingid - primary key, twitter-snowflake id, sortable
- status - enum : waiting | confirmed | cancelled
- seats: [{rowId: A, seatId: 02}]
- userid - FK User entity
- eventid - FK Event entity
- createdOn - datetime // when first created
- lastUpdatedOn - datetime // when last modified
- City
- cityCode - char(2)
- countryCode - char(2)
Read Replica for Events, Venue in a Search Engine
For keyword searches for venue, events and natural laguage and fulll text queries , the master data of events, venue, city are indexed into a Elasticsearch search engine. The read replica is updated via CDC (Change data capture) and its eventual consistency. The booking and seats availability information are pulled from master database.
Elasticsearch schema
Event: {
eventId: ""e01,
version: "1.0",
title: { raw: "The Godfather", keyword: "The Godfather"}
description: "A groundbreaking crime drama that revolutionized the genre"
cast: [name: "Al Pacino", role: "Micheal Corleone"],
duration: 2.55
}
Venue: {
venueId: "v01",
version: "1.0",
title: "PVR MAX",
city: "BLR",
location: {lat: long: },
capacity: 250
facilities: ["3D", "recliner"]
}
RunningEvent: {
runingEventId: "re01",
version: "1.0",
venueTitle: "PVR MAX",
city: "BLR",
location: {lat: 12.23, long: 12.78},
capacity: 250,
shows: [{
name: "Firstshow",
startTime:
}]
}
Data Patitioning
The datase is for each event id + sectionid + row id, the hash of these keys makes the request spread across different partitions. the seats are located closely under this key making the read/writes to a partition. In a popular event almost all of the partitions are utilized equally. when a user books more than one seat, the max chances are that the user always books in the same row which ends up in the same paritition.
High-level design
You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
General Read Flow - non-authenitcated endpoints
Endpoints "/cities/*", "/events/*", "/search/*"
client > NLB - L4 > Regional ALB - L7 (Terminate SSL) > API Gateway (logging, metrics, Authz) > Listing Mciroservice > Cache > Elasticsearch (On cache miss)
Booking Flow
client > NLB - L4 > Regional ALB - L7 > API Gateway > Booking Service > Block seats in Redis cache with TTL 5 mins > Initiate Payment {async callback } > { On success > book seats } | {on fail > retry once > on second failure return error} > release block on seats
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
- Elastisearch - used for faster reads to show up events, venue and total capacity in less than 200ms latency. Supports free text search and keyword search.
- Primary Database - PostgreSQL in clustered mode, the writes happens here for event metadata, venue metadata and order details (booking, payment). The CDC is on evente, venues are written on to elasticsearch
- Microservices - deployed in Kubernetes pods, with failover and auto scale up/down with load balancing. Each of the booking service, listing service are stateless
- Redis cache - for caching seat details, available capacity and blocking seats a TTL and for other data uses LRU algorithm
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
- When concurrent requests are made for same seat, the seat gets locked for 5 mins, in case of payment failure the system tries to unlock the seats however the seats remains inaccessible for short period of time. this ensures strong consistency
- When the requests surges the rate limitter deines few requests resiulting in genuine request to fail. This is to maintain integrity of the system
- Idenpotency - there are chances of users submiiting the form twice of eager to book tickets by refreshing the page and make duplicate requests. This might get denied by rate limitter. The idempotency is handled by system to issue a token whcih has a timestamp part inside it, this is used to identify duplicate requests and deny those booking requests.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?
- Spam filtering
- Being fair in allowing chance to next user in line rather than random booking request.
- hot partition - on Taylor swift or Messi or Ronaldo event, the traffic is going to be huge , an option is to introduce write-back cache where all the claims and seat locking happens in in-memory cache and when the payment is completed the data are written onto database.
- A rogue user can continously claim the seats which ends up locking the requested seats, and then intentionally delay the payment callback. This means the seats gets locked for 5 mins.