System requirements
Functional:
User registration, login, and profile management
Event search, filtering, and detailed event pages
Real-time seat selection and availability
Ticket reservation (temporary hold)
Secure payment and order confirmation
Ticket delivery (PDF/email/QR code)
Admin portal for event and seat management
Support for refunds/cancellations/transfers
Non-Functional:
High availability (99.99% uptime)
Low latency (sub-second seat updates)
Scalability for millions of users and high concurrency (flash sales)
Consistency (especially for seat availability)
Security (payment, user data, anti-fraud)
Observability (logging, monitoring, alerting)
Disaster recovery (data backup, failover)
Capacity estimation
Assume 10M users, 100K concurrent during big sales/events
Peak: 20K ticket purchases/minute
Each event: up to 70K seats (large stadium), 1K+ events simultaneously
DB: 100M+ tickets/year, 50M+ users, 1B+ seat transactions/year
API design
Define what APIs are expected from the system...
Database design
Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...
High-level design
Frontend: Client app (web/mobile)
Backend: Event service, User service, Ticketing service, Payment service
Database: Relational DB for transactions, NoSQL for event catalog
Cache: Redis for seat availability, prevent overselling
Concurrency control: Optimistic locking or atomic seat reservation
Queue: For order processing, ticket delivery, notifications
CDN: For static content & seat maps
Monitoring: Prometheus, Grafana, alerting tools
Request flows
User Browses Events:
Client sends a request to the API gateway for event listing.
API gateway routes to the Event Service.
Event Service queries the Event Catalog Database or Cache.
Results returned to client.
User Selects an Event and Seats:
Client requests seat map for a specific event.
API gateway routes to Ticketing Service.
Ticketing Service fetches seat availability from a fast cache (Redis).
Seat map and availability sent to client.
User Initiates Booking:
Client sends seat selection and booking request.
API gateway forwards to Ticketing/Reservation Service.
Reservation Service attempts to atomically reserve seat(s) in Redis (using SETNX/atomic operation).
If successful: seat temporarily held for user.
If failed: user is informed seat is not available.
Reservation Service returns hold confirmation to client.
User Completes Payment:
Client submits payment details.
API gateway routes to Payment Service.
Payment Service processes via payment gateway.
On success: confirms booking, persists to database, triggers ticket generation and notification (via message queue).
On failure: releases seat hold in Redis.
Ticket Delivery:
Detailed component design
The core components for a scalable online ticketing platform are the Ticket Reservation Service, the Payment Service, and the Notification Service. The Ticket Reservation Service is responsible for managing real-time seat availability and handling atomic reservations. To achieve high concurrency and prevent overselling, this service leverages Redis as an in-memory store, using atomic operations such as SETNX to lock seats when a user attempts to reserve them. Each seat is represented as a unique key (combining event and seat ID), and a short TTL (Time To Live) is set on the lock to auto-release in case the payment is not completed, ensuring that seats don’t remain perpetually locked.
The Payment Service is stateless and horizontally scalable, interfacing asynchronously with external payment gateways for processing transactions. Once payment is successful, it publishes a message to a distributed queue (like Kafka or AWS SQS), which decouples ticket generation and notification from the payment workflow—improving system resilience and throughput. If payment fails, the Payment Service calls the Reservation Service to release the seat lock, making it available for others.
The Notification Service consumes messages from the queue for successful bookings, generating digital tickets (with QR/barcode) and delivering them via email or SMS. As this service is also stateless, it can run multiple consumers in parallel to handle high notification volumes during peak sales. Each component relies on robust monitoring, retry logic, and dead-letter queues to ensure reliability and fault tolerance, allowing the entire system to handle heavy traffic, maintain consistency, and recover gracefully from partial failures. This modular, event-driven approach enables each component to scale independently and ensures a seamless user experience even under peak loads.
Trade offs/Tech choices
Consistency vs. availability: Prioritize strong consistency for seat reservation.
Scalability: Use sharding, caching for hot seats/events.
Performance: Cache reads, async processing for non-critical flows.
Scalability: Use sharding, caching for hot seats/events.
Performance: Cache reads, async processing for non-critical flows.
Failure scenarios/bottlenecks
Failure scenarios in an online ticketing platform can arise from several bottlenecks. The most critical is the database becoming a single point of contention, especially during flash sales, which can cause slowdowns or downtime if not properly sharded or replicated. Cache inconsistency between Redis and the main database may lead to incorrect seat availability and possible overselling, severely impacting user trust. If services like the reservation or payment module lack redundancy, their failure can bring down the entire booking flow. Message queues can also become bottlenecks if consumers are too slow or crash, delaying ticket issuance and notifications during high-traffic periods. Network partitions or connectivity loss between services (e.g., cache and database) may cause orphaned seat locks or incomplete transactions. Lastly, integrating with third-party payment gateways introduces external points of failure, potentially blocking or duplicating bookings if not handled idempotently.
Future improvements
To mitigate these issues, future improvements should focus on robust sharding and replication for databases, using persistent and distributed caching layers, and architecting all services to be stateless and redundant with health checks and autoscaling. Monitoring queue backlogs, employing autoscaling consumers, and adding dead-letter queues ensure resilience against spikes and failures. Implementing idempotent workflows for booking and payments prevents duplicate transactions. Regular synchronization and eventual consistency techniques between cache and database help resolve inconsistencies, while comprehensive observability tools enable teams to detect, alert, and resolve incidents proactively, maintaining high reliability and user satisfaction.