System requirements


Functional:

User registration, login, and profile management

Event search, filtering, and detailed event pages

Real-time seat selection and availability

Ticket reservation (temporary hold)

Secure payment and order confirmation

Ticket delivery (PDF/email/QR code)

Admin portal for event and seat management

Support for refunds/cancellations/transfers


Non-Functional:

High availability (99.99% uptime)

Low latency (sub-second seat updates)

Scalability for millions of users and high concurrency (flash sales)

Consistency (especially for seat availability)

Security (payment, user data, anti-fraud)

Observability (logging, monitoring, alerting)

Disaster recovery (data backup, failover)



Capacity estimation

Assume 10M users, 100K concurrent during big sales/events

Peak: 20K ticket purchases/minute

Each event: up to 70K seats (large stadium), 1K+ events simultaneously

DB: 100M+ tickets/year, 50M+ users, 1B+ seat transactions/year




API design

Define what APIs are expected from the system...






Database design

Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...






High-level design

Frontend: Client app (web/mobile)

Backend: Event service, User service, Ticketing service, Payment service

Database: Relational DB for transactions, NoSQL for event catalog

Cache: Redis for seat availability, prevent overselling

Concurrency control: Optimistic locking or atomic seat reservation

Queue: For order processing, ticket delivery, notifications

CDN: For static content & seat maps

Monitoring: Prometheus, Grafana, alerting tools



Request flows


User Browses Events:


Client sends a request to the API gateway for event listing.

API gateway routes to the Event Service.

Event Service queries the Event Catalog Database or Cache.

Results returned to client.



User Selects an Event and Seats:


Client requests seat map for a specific event.

API gateway routes to Ticketing Service.

Ticketing Service fetches seat availability from a fast cache (Redis).

Seat map and availability sent to client.



User Initiates Booking:


Client sends seat selection and booking request.

API gateway forwards to Ticketing/Reservation Service.

Reservation Service attempts to atomically reserve seat(s) in Redis (using SETNX/atomic operation).


If successful: seat temporarily held for user.

If failed: user is informed seat is not available.



Reservation Service returns hold confirmation to client.



User Completes Payment:


Client submits payment details.

API gateway routes to Payment Service.

Payment Service processes via payment gateway.


On success: confirms booking, persists to database, triggers ticket generation and notification (via message queue).

On failure: releases seat hold in Redis.


Ticket Delivery:





Detailed component design

The core components for a scalable online ticketing platform are the Ticket Reservation Service, the Payment Service, and the Notification Service. The Ticket Reservation Service is responsible for managing real-time seat availability and handling atomic reservations. To achieve high concurrency and prevent overselling, this service leverages Redis as an in-memory store, using atomic operations such as SETNX to lock seats when a user attempts to reserve them. Each seat is represented as a unique key (combining event and seat ID), and a short TTL (Time To Live) is set on the lock to auto-release in case the payment is not completed, ensuring that seats don’t remain perpetually locked.

The Payment Service is stateless and horizontally scalable, interfacing asynchronously with external payment gateways for processing transactions. Once payment is successful, it publishes a message to a distributed queue (like Kafka or AWS SQS), which decouples ticket generation and notification from the payment workflow—improving system resilience and throughput. If payment fails, the Payment Service calls the Reservation Service to release the seat lock, making it available for others.

The Notification Service consumes messages from the queue for successful bookings, generating digital tickets (with QR/barcode) and delivering them via email or SMS. As this service is also stateless, it can run multiple consumers in parallel to handle high notification volumes during peak sales. Each component relies on robust monitoring, retry logic, and dead-letter queues to ensure reliability and fault tolerance, allowing the entire system to handle heavy traffic, maintain consistency, and recover gracefully from partial failures. This modular, event-driven approach enables each component to scale independently and ensures a seamless user experience even under peak loads.






Trade offs/Tech choices


Consistency vs. availability: Prioritize strong consistency for seat reservation.

Scalability: Use sharding, caching for hot seats/events.

Performance: Cache reads, async processing for non-critical flows.

Scalability: Use sharding, caching for hot seats/events.

Performance: Cache reads, async processing for non-critical flows.





Failure scenarios/bottlenecks

Failure scenarios in an online ticketing platform can arise from several bottlenecks. The most critical is the database becoming a single point of contention, especially during flash sales, which can cause slowdowns or downtime if not properly sharded or replicated. Cache inconsistency between Redis and the main database may lead to incorrect seat availability and possible overselling, severely impacting user trust. If services like the reservation or payment module lack redundancy, their failure can bring down the entire booking flow. Message queues can also become bottlenecks if consumers are too slow or crash, delaying ticket issuance and notifications during high-traffic periods. Network partitions or connectivity loss between services (e.g., cache and database) may cause orphaned seat locks or incomplete transactions. Lastly, integrating with third-party payment gateways introduces external points of failure, potentially blocking or duplicating bookings if not handled idempotently.






Future improvements

To mitigate these issues, future improvements should focus on robust sharding and replication for databases, using persistent and distributed caching layers, and architecting all services to be stateless and redundant with health checks and autoscaling. Monitoring queue backlogs, employing autoscaling consumers, and adding dead-letter queues ensure resilience against spikes and failures. Implementing idempotent workflows for booking and payments prevents duplicate transactions. Regular synchronization and eventual consistency techniques between cache and database help resolve inconsistencies, while comprehensive observability tools enable teams to detect, alert, and resolve incidents proactively, maintaining high reliability and user satisfaction.