System requirements
Functional:
- The user can park a vehicle: truck, car, van, motorcycle.
- Users can see available free lots of every type before starting parking.
- Disabled users can park their vehicles.
- The system should allocate the parking lots optimally: by price or by space.
- Parking lots may be either small, medium, or large.
- Parking has few entrances and exits and few floors.
- Users can buy tickets in the ticket machine by choosing the time interval, and transportation type.
Non-Functional:
- The system should be highly available to process the user's requests if though if one entrance is broken
- The system should process a user request with a response time lower than 100 ms.
- The system should be scalable to add additional entrances and exits for high user request frequency.
- The system should have the strong consistency
Capacity estimation
Let's say the customer would free all the parking lots every hour.
Car's information:
plate (10 bytes)
brand (30 bytes)
model (100 bytes)
Parking lot information:
parking lot id(8 bytes)
type: 1 byte
status: 1 byte(free,reserved, occupied)
for disabled: 1 byte
start Date:(8 bytes)
end Date: (8 bytes)
Let's count we have:
300 small, 200 medium, and 100 large parking lots. The
truck may take 3 small or 2 medium parking lots if large ones aren't available.
It would generate 150 bytes * 600 = 100k * 24 = 2,5 Mbytes per day. 1 Gbytes for 1 year and 5 Gbytes for 5 years.
API design
park(apiKey,car plate, type of transportation, start time, end time) park a car if the available lots exist, otherwise, return the error.
unpark(apiKey, ticketId) the user would pay the parking time with a bank card or cash).
getParkingLots(apiKey,type of transportation=[],typeOfParkingLot(free,occupied,reserved,offset,limit) get the parking lots with paging.
pay(apiKey, ticketId ,paymentDetails) pay the parking lot with different ways: the bank card or cash.
Database design
The primary data model for this system is two tables: one is for cars and the other is for parking lot occupied by the car.
It requires strong consistency. Once the parking lot is occupied or freed, it should be read by everyone.
RDBM such as Postgree, would fit the bill. It is also horizontally scalable and perfomant and it's strong consistent.
Car table:
car plate (10 bytes)
brand (30 bytes)
model (100 bytes)
Parking lot table:
parking lot id(8 bytes)
car plate (10 bytes)
type: 1 byte (small, medium or large)
status: 1 byte(free,reserved, occupied)
for disabled: 1 byte
start Date:(8 bytes)
end Date: (8 bytes)
latitude: (2 bytes)
longitude (2 bytes)
We may choose to have some secondary indices for the table parking lot: status, type, start date, and end date. For example, if the user wants to park his car, the system would search by status, and type of parking lots.
High-level design
See the diagram from the high-level architecture.
API Gateway provides DDoS protection and TLS termination, and forwards request to the right service nodes.
We have two microservices. Parking service specializes in getting optimistic free parking slots according to user's car needs. First, the parking service looks at the Redis cache for available services, if there is no such, it goes to DB. If a user calls unpark/park endpoints, the parking service updates this information in DB. The parking service gets the pool of available parking lots by transportation type and distance to process user requests quickly.
We will use LRU to keep the Redis cache at optimal size.
Payment service processes the user's payments by using an external payment gateway like Stripe.
Request flows
park(...) API:
The client sends the request. API Gateway forwards it to the Parking service. The parking service gets ready for available parking lots from the Redis cache.The parking service can't find the parking lot, it goes to DB to query the necessary parking slot and mark this parking slot as occupied. The parking service sorts the all free parking slots by transportation type, and distance, for the disabled.
unpark(...) API:
The client sends the request. API Gateway forwards it to the Parking service. The parking service marks the occupied parking slot as free updates its state in DB and places the parking slots to the Redis cache.
pay(...) API:
The client sends requests to the payment service to pay for the parking before exiting the parking. The payment service uses the external payment gateway to process the client's requests.
Detailed component design
Performance and scalability of park() and unpark() API are extremely important for this system. As such, we employ Redis cache to reduce the user's response time when the user is going to park his car.
Sorting the available parking lots by transportation type, parking lot type and distance from the parking entrance gives a big boost to get the free parking lot quickly.
We can store these sorted parking slots in the Redis cache and create, and update the DB indexes to access the data in DB quickly as well. Before the system starts we're going to warm up the cache with available parking slots.
Pay() API delegates the payment processing to the external payment gateway.
Partitioning
Database and Cache should be partitioned for improved scalability.
park() and unpark may be partitioned by transportation types. It wouldn't give us even distribution but the system load is low and it doesn't have a negative effect on the system performance.
Trade offs/Tech choices
Need to develop a strategy to place the different types of cars efficiently use of parking lots. If some type of parking lots aren't enough, we may use the different parking lots to place trucks, a truck may take 3 small nearby parking lots.
We should care about the disabled and reserve for the different types of parking lots. Also, we should care about electrical blackouts and the user may pay for the parking lots with checks.
TicketId should include parking lot number, parking lot type, start date, and duration.
Failure scenarios/bottlenecks
Fault Tolerance
All the components - Load Balancers, Web Servers, Cache and Database should have multiple instances for improved availability. There should be robust monitoring and alerting systems on them.
All nodes can fail. Let's look at important failure cases.
Failure in Parking Service
If Parking Service fails (hardware failure, crash, software bug, network partition, slowness ...), it would directly impact the most time sensitive operation of this system, i.e., redirectURL(). To mitigate this, we should always run multiple Mapping Service nodes. It is s stateless service, so we can multiple nodes of the same service. We can use a coordination service, e.g., ZooKeeper, to track which nodes are alive (i.e. sending regular heartbeat to ZooKeeper), which are likely dead (i.e. not sending heartbeats for some time), which nodes should be taking requests.
Failure in Cache
Losing cache would also impact park and unpark API functionality. For example, let's say Redis Cache that is holding 10% of parking slots goes does due to a faulty memory. Parking service will now have to access the database for each of these mappings. This makes requests much slower. Increased load on the database may even have cascading impact - database gets slower and slower, Parking Services retry, making the database even busier - ultimately resulting in the database crash.
We have multiple mitigations.
a. Create read replicas for Redis Cache. Let's say for 1 leader, we put 2 read-only replicas. Writes are handled by the leader, and propagated to read replicas by transmitting a write log. Reads can be handled by all three. If the leader goes down for some reason, one of the read replicas can become the leader (after a leader selection process) and take over the responsibility as the leader. This would avoid the aforementioned scenario.
b. Parking Service should have a mitigation strategy to avoid overloading the database. For example, exponential backoff before retrying, rate-limiting, and circuit-breaking.
To make the system more fault-tolerant, we deploy the payment, and parking services in a few instances. These services are stateless and if one of them falls down, the Kubernetes we can use it to orchestrate container runs new service instances.
DB uses Postgres which may have a few replicas. The leader is processing the write request and replicating these changes to other DB instances. Here DB leader is the single point of failure. To fix that we can organize the standby leader and active to active master replication.
Failure in Payment Service.
The external service may fail so we could integrate it with several payment gateways.
If it was the blackout we would stop parking and process only the parked cars to leave by using the payment method as checks.
Scalability - park and unpark API:
As the number of write requests to park and unpark API increases, it might put too much pressure on the database, causing slowness, errors, or even crashes.
To avoid this, we can introduce a message queue to buffer the requests. API Gateway would push a message in the message queue, representing the request. Parking service would pull from the queue, process the request, and notify the API Gateway client requests processed with a hook. The system can inform the client with long polling.
Future improvements
Add the metrics, collect analytics to predict the user request peaks.