System requirements


Functional:

1) User should be able to view available movies and

seats within a movie show

2) Users should be able to book tickets

3) Users can view their bookings

4) Search for a movie or theater



Non-Functional:

1) Low latency for search

2) Handle concurrent bookings to avoid over/double booking

3) Scalable for high throughput in case of famous events




Capacity estimation

-> Assume 100M users and 10% of DAU

write TPS : 10M / 0.1M ~= 100QPS

read TPS 10* write ~= 1000QPS

peak time write TPS : 500QPS (5x)

peak time write TPS : 50000QPS


Storage :

each booking event storage 1kb around :

per day : 100M * 1kb ~= 100GB

per year : 365*100GB = 40TB



API design


search for events :

GET /v1/event?name=n&location=l& .. etc query params

returns list of events


book a ticket :

POST /v1/event/{e_id}/bookings

Request :

{

ticket_ids :

payment_details :

}


get event details :


GET /v1/event/{e_id}





Database design


use SQL DB for ACID capabilities


event : id, name,desc,performer, venue,date

ticket : id,event_id, booking_id, seat,price,status

bookings : id, event_id, list of tickets, user_id

User : id,name,email,pwd_hash




High-level design

System Overview

This diagram represents a ticket booking system designed to handle:

  1. Event Discovery (via Search Service and Event Service).
  2. Seat Booking and Synchronization (via Booking Service and Redis-based Distributed Lock).
  3. Data Caching and Search Optimization (using Redis and Elasticsearch).
  4. Resilient and Scalable APIs (using an API Gateway with load balancing, rate limiting, and authentication).

Components Explanation

Here’s how each component works to solve the ticket booking problem:

1. API Gateway

  • Acts as the entry point for user requests.
  • Responsibilities:
    • Load Balancing: Distributes incoming requests across multiple instances of services (e.g., Search Service, Event Service, Booking Service).
    • Rate Limiting: Prevents abuse by limiting the number of requests per user.
    • Authentication: Verifies users before processing requests.

2. Search Service

  • Handles search queries for events (e.g., "Find concerts in New York").
  • Fetches data from Elasticsearch, which is updated in near real-time via Change Data Capture (CDC) from the PostgreSQL database.

3. Event Service

  • Retrieves detailed information about specific events (e.g., seat availability, pricing, venue details).
  • Uses Redis Event Cache for fast lookups.
  • If the data is not in Redis, it fetches from the PostgreSQL database and updates Redis for future requests.

4. Booking Service

  • Handles the core booking logic:
    • Processes requests to book one or more seats.
    • Communicates with Redis to acquire a distributed lock to ensure seat availability is checked and updated atomically.
    • Updates PostgreSQL to persist booking data.

5. PostgreSQL

  • The primary database for storing all persistent data:
    • Event metadata.
    • Seat availability.
    • User booking information.

6. Redis

  • Plays a dual role:
    • Event Cache: Caches frequently accessed event details to reduce database load.
    • Distributed Lock: Ensures synchronized access to critical resources (e.g., seat availability) in the Booking Service.

7. Elasticsearch

  • Optimizes search performance by indexing event data.
  • Continuously updated via Change Data Capture (CDC) from PostgreSQL.

End-to-End Flow

1. Searching for Events

  1. User Action: User searches for events through the API Gateway.
  2. API Gateway: Forwards the request to the Search Service.
  3. Search Service:
    • Queries Elasticsearch for event data.
    • Returns the search results.

2. Viewing Event Details

  1. User Action: User selects an event to view details (e.g., available seats).
  2. API Gateway: Forwards the request to the Event Service.
  3. Event Service:
    • Checks Redis Event Cache for event details.
    • If not found, fetches details from PostgreSQL, updates Redis, and returns the response.

3. Booking a Ticket

  1. User Action: User initiates a booking request through the API Gateway.
  2. API Gateway: Routes the request to the Booking Service.
  3. Booking Service:
    • Acquires a Redis Distributed Lock to prevent race conditions.
    • Verifies seat availability in PostgreSQL.
    • Updates the booking record in PostgreSQL.
    • Releases the Redis lock after completing the transaction.






Request flows

1) Get event details : user requests via API, gateway will send it to event service which looks up redis cache first and then to postgres DB event table.


2) Search for event : Search service will query elastic search DB and give the result based on text and date. data b/w elastic search & postgres is replicated via kafka.


3) Book ticket : booking service will acquire distributed lock via redis(with TTL) and make a booking and update bookings table. payment can be done via 3rd party API.




Detailed component design


I don't think for a HLD interview we need to dig deep into component design, skipping it for now




Trade offs/Tech choices

Postgres is used as DB because of its ACID properties, but with high writes it may be a bottleneck. we can shard data appropriately and use read replicas to handle read traffic.


Elastic search is used for full text search/fuzzy search which improves latency of searches.

This may cause operational overhead and costs associated.


kafka is used to replicate data b/w postgres and elastic search.


Redis is used for distributed locking. It is quick and in memory so it reduces latency. But it can crash sometime causing issue.

We can use redis sential to monitor redis availability and make a replica as leader





Failure scenarios/bottlenecks

Postgres can be a bottleneck with heavy writes.

redis can be bottleneck if its crashes.

data consistency b/w elastic search and postgress, postgres and redis cache can be a issue if not handled properly.




Future improvements


Already mentioned above how we an overcome