System requirements


Functional:

List functional requirements for the system (Ask interviewer if stuck)...

  1. Allowing drivers to reserve parking spots
  2. Releases a reserved parking spots to available parking lot pools in the event of no-shows.
  3. Keep track of available parking spots: The system needs to maintain real-time updates on the number of empty spots for different types of vehicles. Sometimes a rough estimation
  4. Tracks Check-in and check-out processes at the gate.
  5. Record entering time: Capture and store the timestamp when a vehicle enters the parking lot to track duration.
  6. Identify available parking spots: Determine available spots based on the size of the vehicle upon entry.
  7. Handle full parking lot: Provide a message when no parking spot is available, directing the user to exit.
  8. Calculate parking fee: Calculate fees based on the duration a vehicle is parked in the lot.
  9. Accept payment: Allow drivers to make payments using various methods upon exiting the parking lot.


Non-Functional:

List non-functional requirements for the system...

  1. High availability: Ensure the system is operational and accessible at all times to avoid disruptions.
  2. Secure payment system: Implement robust security measures to protect payment transactions and user data.
  3. Data preservation: Store payment and vehicle entry/exit logs securely in a durable and reliable database for auditing and analysis purposes.
  4. Data Consistency: Ensure data consistency when multiple vehicles come in at the same time.
  5. Scalability: The system should be able to handle growing number of parking lots and users over time.


API design

Define what APIs are expected from the system...


GET /v1/parking-lots/{lot_id}/availability

Query: vehicle_type: required, enum (Compact, Large, Motorcycle, Handicap, EV)

level_id (Optional): If the user wants a specific level.

Response:

lot_id (int)

status: (FULL, ALMOST_FULL, AVAILABLE)

breakdown: (dict from level to number of available spots)

vehicle_type: enum (Compact, Large, Motorcycle, Handicap, EV)


POST /v1/parking-lots/{lot_id}/reservations

Idempotency-Key: (To identify retries and prevent double booking)

Body: vehicle_type: required, enum (Compact, Large, Motorcycle, Handicap, EV)

user_id: string

license_plate_number: string

start_time: timestamp

end_time: timestamp

Response body:

reservation_id: string

estimated_total_fees: float

expires_at: timestamp (the entry deadline, the client needs to enter the garage by this tiime, otherwise the reservation is expired and the spot will be released).

status: enum (Should be Pending at this time)

// If fail, use 404 to indicate no spots are available, 400 indicating part of the request is invalid - end_time before start_time, level_id does not exist, etc.


POST /v1/parking-lots/{lot_id}/enter

Body: license_plate_number: string

reservation_id: string

gate_id: string

response body: (If succeeded, returns the reservation details)

ticket_id: string

reservation_id: string

level_id: string

rate_card: (Object) {hourly_rate: float, currency: enum (e.x. USD)}

entrance_time: timestamp

status: enum (should be InProgress)

// If fails, there should be 404 error indicating reservation not exist. 400 indicating mismatch between license_plate_number and the reservation_id. 410 indicating the reservation has expired. 409 indicating the vehicle has already entered the garage (prevent double processing)


POST /v1/parking-lots/{lot_id}/payments

Body: ticket_id: string

payment_method: (enum)

card_number: (required if payment_method is credit or debit)

payment: (amount: float, currency: enum)

response: payment_id: string

status: (succeeded)

exit_deadline: timestamp


POST /v1/parking-lots/{lot_id}/exit

Body: ticket_id: string

gate_id: string

response: status: succeeded


POST /v1/parking-lots/{lot_id}/kiosk-exit

Body: ticket_id: string

payment_method: (enum)

card_number: (required if payment_method is credit or debit)

payment: (amount: float, currency: enum)

gate_id: string

response: payment_id: string

payment_status: succeeded


// If payment keeps failing at exit, there should be a fallback to ask the customer to pay later (sending a bill)


High-level design

You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design...

This system contains following components:

Client - user-facing UI machines at entrances and exits of the Parking lot.

API Gateway / Web application Firewall: This component will handle the rate limiting and security. For example, this can ensure that the service continue to run when a burst of traffic coming in.

Load Balancer: The load balancer make sure that all traffic is distributed across different servers.

Redis Cache - A cache layer that typically used for common reads. The cache should be an advisory cache where it is only used for reads, ensuring fast experience for popular parking lots. All writes should be handled by the database directly. Upon writes completion at database, cache can be updated accordingly.

Server - The main business logic lives here.

Database - Holds table to track all number of available parking spots. As well as recent parking record and payment record.

External Payment System - a third party payment system that takes payment (e.g. square)

Delayed Message Queue - Used for handling reservation expiration. Whenever a reservation is made, the service should fired a message into this message queue, the message should only become visible when the reservation is about to expire. When handling the message, the spot should be released if the reservation is indeed expired (the reservation is still pending).

Reconsiliation Job - Running every 5 minutes, looking database and find those that stuck at PENDING_PAYMENT for too long (say, 10 minutes), and query the PSP endpoint to confirm state.

Cron machine - Running a cron job to obtain old records (probably records older than 1 year and stores them in cold storage), the records could be cleared up in Database.

Cold Storage - Low cost storage to store old record, as they're seldom accessed.


User flow: When a user search for availability for a parking lot, their request reaches API Gateway and subsequently moves to one of the server, after load balancer dispatches it. Application Server will then read from database. The request should first hit a redis cache, if there is a cache hit, returns the available count from the cache, otherwise read from the database and write to cache. We should use a pre-computed counter table instead of count on real-time during read.

When a user decided to reserve a spot, again their request reaches API Gateway, load balancer and application server. The application should first attempt to decrement the redis available count, if the redis count returns >0, it will then proceed to database query. The application server should do an atomic update as a transaction: update the available spots in the parking lot database table, and only create a reservation in reservation table if the previous update is successful. If it fails, the server should returns a 404. If the transaction is successful, the application server should update the cache. Upon reservation, the service should also fires a message in a delayed message queue, which handles the event of no show.

If a user enters the parking lot, the application server should update the database reserve table only.

When the user makes a payment, the application server should talk to external payment system for handling it, and then update the database.

Releasing a spot is causing by the delayed message queue or customer leaving the parking lot. In both cases, application server will need the same mechanism as reservation: Using a transaction to update both the available spots and the reservation record.


Payment processing: The service do not process payment by itself, it will use an external PSP. Doing so can avoid operation and compliance overhead.

The system should in general interact with PSP in 2 ways - Asynchronous and synchrounous

Asynchronous is used when a user make a payment at a kiosk before they exit the parking lot. Application Service should send the payment request (alongside with some idenpotency keys) to the PSP and immediately respond 202 Accepted. Once PSP is done with processing the payment, they're expected to call a webhook in the service. The webhook handler should update the Database status for the reservation.

Synchrounous method should be used when the user makes payment when exit, the service should use some synchrounous endpoint and wait for PSP to respond. If the payment failed or timeout, the service should return status such as delayed payment. And clear the line as soon as possible. They can later on bill the customer instead.

In order to make up for the potential connection partitioning or network issue for webhook, a reconsiliation job should be ran every 5 minutes, fetch the database, find all reservation that's at PENDING_PAYMENT state for too long, and query the PSP for the status.



Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...

Cron machine & cold storage: The past parking records and payment records might be useful for future reference, audition, etc. However, it is not accessed very frequently, therefore, a cron job could be setup to run monthly and pull data that's over 1 year ago and store them in s3.

Server: we have very low traffic so 1 machine could handle all traffics. However, in order to ensure the availability of the service, we introdcue duplication in the server. We can have multiple (3 might be good enough) machines to serve the data. Whenever a client calls the service, it can use a round robin mechanism to choose a server to call.

Database: Similar to server, we are not expecting a huge number of data, and the data can fit in to one machine. However, in order to avoid downtime introduced by machine crashing or maintenence, we should introduce replica of the database. The database could run in master - slave mode, and the master is responsible to copy data over to slaves. In case the master machine is unavailable, a slave machine will then become the new master.


Database: As mentioned above, the database should include a table for aggregated count of all spots of a certain type of vehicle for a certain parking lot, instead of individual parking spot. This row should be used in conjunction with the reservation table to calculate the availability. When updating, the command should have a condition where the spot is greater than 0. If this condition failed, the reservation request fails. The reservation record should be created in the same transaction, so that these updates are atomic. To avoid the check-then-commit race condition, the system should fit-check into database transaction. An example SQL query should be as follows.

BEGIN TRANSACTION;


-- 1. (Optional but recommended) Lock the specific parking lot to prevent Phantom Reads

SELECT pg_advisory_xact_lock(123);


-- 2. The Atomic Fit-Check and Insert

INSERT INTO reservations (id, lot_id, vehicle_type, start_time, end_time, status)

SELECT

'res_999', 123, 'COMPACT', '2023-10-27T14:00:00Z', '2023-10-27T16:00:00Z', 'CONFIRMED'

WHERE (

-- The Fit-Check: Count how many existing reservations overlap this time window

SELECT COUNT(*)

FROM reservations

WHERE lot_id = 123

AND vehicle_type = 'COMPACT'

AND status IN ('CONFIRMED', 'ACTIVE')

AND start_time < '2023-10-27T16:00:00Z' -- Existing start < Requested end

AND end_time > '2023-10-27T14:00:00Z' -- Existing end > Requested start

) < (

-- The Capacity: Ensure the overlap count is less than the total allowed spots

SELECT total_capacity

FROM parking_lots

WHERE id = 123

);


COMMIT;

In this way, the database make sure that in the specific time window when the user is bookig, there are available spots. In some cases, in order to avoid the situation where a customer leaves late but another customer arrives earlier, we can have larger time frame (like, add 15 minutes to begin and end of their booking time) for each reservation.


Redis Cache for fast conflicts: In order to handle the situation where a large number of users content for the same parking lot at the same time, the application server should first query redis cache and asks it to do a decrement on available spot first. Only if that's successful, it should then proceed to update database. This make sure that requests failed fast. Besides, the application client should implement an exponential retry, so that it will not overwhelm the system. Note that cache could contain stale data so it is not used for actual booking / writing.


Write requests such as booking and payment, etc. should contain an idempotency key so as to avoid double processing.


As mentioned before, is a customer arrived late or not show and the delayed message queue has cancelled their reservation, the spot will be reclaim.