System requirements


Functional:

For Drivers:

  • Listing Parking Lots: [P0] Listing all available parking lots in the system for the specific area. Note that for sake of this interview, we don't consider search by distances(geographically).
  • Detailed Information: [P0] Getting detailed information about a specific parking lot, including location, pricing, and availability.
  • Reservation: [P0] Ability to reserve a parking spot and see previous reservations.
  • Payment Process: [P0] Completing payment when leaving the parking lot.


For Parking Lot Admins:

  • Parking Lot Management: [P0] Listing all owned parking lots and viewing detailed information (availability, location, pricing). => ListOwned
  • Update Information: [P0] Updating the information of parking lots (e.g., prices, contact info).
  • Statistic and Reporting: [P0] Accessing statistical data about parking lots (revenue, space utilization).



Non-Functional:

  1. Expected Drivers: You could target about 100,000 active drivers using the system. This could include daily commuters, tourists, and residents who park regularly.
  2. Reservation Frequency: Assuming about 30% of these users reserve a parking spot at least once a week, that would mean around 30,000 reservation transactions weekly.
  3. Number of Parking Lots: The initial target could be around 500 parking lots in Seattle. This number could encompass a mix of public and private facilities.
  4. Capacity Per Lot: If each parking lot has an average capacity of 50 parking spaces, that would lead to a total available parking space of approximately 25,000 spaces across all lots.
  5. Cost: Start with a Minimum Viable Product (MVP) to minimize upfront costs. Focus on core functionalities and a smaller user base to test the waters.
  6. Security: There are two personas as target customers of this system, proper security boundary and access should be warranted. Different kind of customers should not have access to each other and vise versa.


Capacity estimation

  1. Expected Drivers: You could target about 100,000 active drivers using the system. This could include daily commuters, tourists, and residents who park regularly.
  2. Reservation Frequency: Assuming about 30% of these users reserve a parking spot at least once a week, that would mean around 30,000 reservation transactions weekly.
  3. Number of Parking Lots: The initial target could be around 500 parking lots in Seattle. This number could encompass a mix of public and private facilities.
  4. Capacity Per Lot: If each parking lot has an average capacity of 50 parking spaces, that would lead to a total available parking space of approximately 25,000 spaces across all lots.





API design

Derived from the use cases, below are following APIs for different groups of users:


Some data structure we would use below:


ParkingLotMetadata {

String id,

boolean hasSpace,

float hourlyPrice,

float dailyPrice,

String address,

String phoneNumber,

boolean hasDisabledParking

}


ParkingLotInfo {

String id,

boolean hasSpace,

float hourlyPrice,

float dailyPrice,

String address,

String phoneNumber,

boolean hasDisabledParking,

List lots,

}


Location {

boolean reserveOnly,

boolean disabledParking,

Enum locationType(S,M,L),

boolean isOccupied,

Integer floorNum,

Integer positionNum,

}


Reservation {

Enum Status{Reserved, InProgress, Finished, Canceled}

Date reservationDate,

Duration, reservationDuration,

String parkingLotId,

String parkingLotNumber

}


DurationType {

HOUR,

DAY

}


PaymentType {

CreditCard,

Paypal,

ApplePay,

GooglePay,

Venmo

}



OwnedParkingLotMetadata {

String id,

float hourlyPrice,

float dailyPrice,

String address,

String phoneNumber,

boolean hasDisabledParking,

Date OwnUntil,

String contractLink

}


For Drivers:

  • Listing Parking Lots: [P0] Listing all available parking lots in the system for the specific area => List<ParkingLotMetadata> ListParkingLots(location: Enum, hasEmptySpace: boolean, priceUpperBound: Integer), this API in future can be extended to have more filtering / even search functionality as it evolves. Example URL: GET /driver/parkingLot/{location}.
  • Detailed Information: [P0] Getting detailed information about a specific parking lot, including location, pricing, and availability. ParkingLotInfo GetParkingLot(ParkingLot Id). Example URL: GET /driver/parkingLot/{ParkingLotId}
  • Reservation: [P0] Ability to reserve a parking spot and see previous reservations. There are two APIs involved: 1. boolean reserve(parkingLotId, positionNumber) 2. List<Reservation> listReservations(). Example URL POST /driver/reservation/new.
  • Payment Process: [P0] Completing payment when leaving the parking lot. [We need to know the time vehicle stays in the lot -> this is related to the monitoring system in the admin park, those data can be shared] => paymentId pay(durationType: DurationType, duration: int, paymentType: PaymentType). corresponding Get/List APIs would be required for customers to view the previous payments. Due to time constraint, we don't explore it here. Example URL: POST /driver/payment/new


For Parking Lot Admins:

  • Parking Lot Management: [P0] Listing all owned parking lots and viewing detailed information (availability, location, pricing) => List<OwnedParkingLotMetadata> ListOwnedParkingLots().
  • Update Information: [P0] Updating the information of parking lots (e.g., prices, contact info). => OwnedParkingLotInfo UpdateOwnedParkingLot(newOwnedParkingLotInfo). Note that this would be an async API. The parking lot information(especially price) can bring a lot of confusion for drivers if we update it immediately. Whenever there is an update on those information, it would be better if drivers can receive a notification if they visit the parking lot before. And the actual change will be effective some time later(e.g. 3 months later). Therefore, we provide enough time for existing drivers to determine any chances needed on their daily parking space. This would also involve some status tracking and additional complexity on the system but would provide a better UX so if we have capacity for this, we should do so.
  • Statistic and Reporting: [P0] Accessing statistical data about parking lots (revenue, space utilization) => Statistic getStatistic(String parkingLotId)



Database design

In the ER diagram session, I have defined different resources that might be used in this application. Considering we are clear about the data access pattern and we need horizontal scaling considering the data volume above. I'd recommend to go with NOSQL Databases.


However, with this decision, there are several caveats:

  1. [Not limited to NOSQL]Distributed transaction might be a concern in future if we need to deal with financial operations. For now, I believe we will use 3rd party integration such with Venmo / bank but in future there is a chance that we'd expand our territory to our own payment type. Then we need to think about distributed transactions we'd have to make(e.g. using 2PC). Note that this also applies to RMDB as well.
  2. For complex queries or some data access patterns not shown here(e.g. for business analysis), those need to be done in a data warehouse or datalake application.


Among various types of database, considering the data access pattern, wide column database is a better fit. If we'd like to have more management/control and have more resources then we can go with Cassandra. If we would like to go with cloud managed-solutions then DynamoDB can be the one to choose. It depends on our cost budge and the overall infrastructure decision.





High-level design

On high level, there are multiple components we need in this system:

  1. APIGW, a front end service handling AuthN,Z and throttling.
  2. Load balancer to route requests to different hosts, considering using different routing algorithms as required and perform reliable health checks.
  3. Application Hosts to handle requests and interact with the databases when require read/write to the databases.
  4. Database. It stores required data including parking lot data, user data, payment data and reservation data.
  5. Statistic Component. Considering usually the parking lot owners won't need to statistic often(e.g. every 2 week/month), we can run it as a batch job instead of a stream processor. This will aggregate statistics the parking lot owner would care about.
  6. Business warehouse component. This is to generate business data for the administrator of the entire system to gather required business data to make decisions. Those data will be coming from the primary data store with a batch ETL job to keep the primary data store and the business data warehouse in sync.
  7. Event Notification Component. For updateParkingLot API used by owner, this is an async API. To avoid the owner consistently check the status, there is a chance that we can push notifications to them(text, email, etc) whenever there is a status change such that they don't need to keep tracking the status update.
  8. Monitoring component. This is the internal component to manage all the monitoring pieces of the system availability / latency and so on.
  9. Ticketing system. Ticketing system is needed such that whenever driver and owners can get support whenever has a concern on billing or owner has some issues on the parking lot pricing.
  10. [Optional] Metering system. We need to bill owners for this application and even drivers(if it is a private / high-end garage).
  11. [Optional] Payment Component. This component is required as it will integrate with various third party payment methods and have its own domain. However, if we don't support a lot of payment methods, it can be living in the Application host such that we don't get into the micro-service hell.
  12. [Optional] Separated clients for different types of users. This would be better for customers to know the hat they are wearing when using the client, either as a driver or a parking lot admin.





Request flows

For the sake of interview, below request flows sample some APIs and demonstrate the workflow.

  1. List<ParkingLotMetadata> ListParkingLots(hasEmptySpace: boolean, priceUpperBound: Integer). A driver will initial this request to see all the parking lot available. The request goes through APIGW, LB and get to server hosts. The server will initialize a query on an index to get all the parking lots available in current location(e.g. Seattle). Note that this query is expensive as it indexes on location and can be used by a large number of driver, therefore, we would consider adding a cache(e.g. redis,memcache) to reduce the traffic to the data store.







Detailed component design

I'd like to dive deep into following components:

  1. Notification service. Considering that we have 100,000 active drivers in the system, it is likely that we need a distributed notification service. We can build this through some open-source solutions or cloud managed ones(SNS). If we're about to build a notification service from scratch, we would need to have a cache on different target users for different kind of events. Considering the volume of notification should be low, we don't necessarily need a fully-fled distributed notification for it. We can build a simple notification service with a list of hosts handling the message sending for some requests. We can use Zookeeper as a configuration service to manage a list of hosts which send out actual notification. We'd likely use a load balancer to distribute different notification requests to servers.
  2. Another interesting component is the logic of making payment, it was not included in the sequence diagram. It takes in the payment information from user(E.g. credit card info / they want to redirect to 3rd party like venmo, paypal accounts). This component will integrate with those different upstream payment methods. At the meanwhile, it is required to calculate the total price. There are couple different billing cases here below. Then once we get the total price, we can send those information to corresponding dependency to charge and finish the payment.
    1. Normal hourly rate, this is simple.
    2. Daily rate, it is also straightforward as long as we know the start date and end date of the parking.




Trade offs/Tech choices

  1. Database - NoSQL is better considering the data access pattern is known and also for horizontal scaling. Besides, it is easier to extend it for future access. Besides, we go with wide column databse type considering our use cases. SQL would be better for use cases where we favor denormalization and want more flexible queries(e.g. business report / usage pattern queries).
  2. Compute platform: we will go with host-based approach considering that we have large volume of customers and we don't want to bias for more abstractions like serverless / aws lambda, etc.
  3. Considering the notification volume is low, instead of building a fully-fledge distributed notification service, we'd go with a simplified version of it and just sending requests to hosts.
  4. Client we are vending=> In order to ensure they provide proper error handling, we can equip the client with exponential back-off retries(or even using bucket token algorithm to understand the server status from the client-point-of-view). We can also use circuit breaker as well although it might return false positive result and provide a lower availability than the actual server.





Failure scenarios/bottlenecks

  1. sync API failure when dependencies are down -> we can throw 503(ServerUnavailableException).
  2. Async API failure
    1. If the error is before it comes into our async workflows(starting with the distributed queue), then we will handle it like above(i.e. throw back to customers and rely on them for retry)
    2. If the error happens within the async workflow, then we would apply with proper retries as latency doesn't matter here. For example, if we fail to process a message in the queue, we can put it back and annotating the number of times it has been retry so we can retry on it later when system is more stable. In the worst case where the retries cannot mitigate the issue then it will be dropped into a dead-letter-queue and send out an alert to engineers when manual intervention is required.
  3. For Database failure - to ensure durability, our database will replicate and ensure we don't lose data when some of storage nodes are down. There are multiple ways of doing replications, including single leader, leaderless, multileader. For cassendra, it is using leaderless which essentially send write/read requests to multiple hosts. The successful reply only gets generated when a certain(usually majority) of the hosts reply. However, leaderless replication has some issues when concurrent write/read comes in. For example, without single leader, it is hard to tell which write happens first, maybe the last-write-win has to be used(or other conflict resolution algorithms).
  4. For host failures, there are consistently health check on the host itself(local health check) and the load balancer level(liveness health check). Whenever a host is unhealthy, load balancer will stop sending requests to it and route traffics to other hosts. If it keeps failing for certain period of time, it will be replaced by a new host. Be aware of fail-open practice during the load balancer design where if all hosts are unhealthy, we will route traffic to all of them. This can help health check implements something like dependency checks. One additional way we can do is to have dependency check done on a separate process and only alert instead of automatically turn down to avoid false positive results impacting the entire fleet. Anonymous detection is also something we can build but it can be expensive and more for low-severity alert to discover some hidden issues.


Future improvements

  1. For monitoring, I don't mention a lot above but want to highlight the importance of monitoring. Using ELK(ElasticSearch, LogStash and Kibana) can be a open-source approach to set up the monitoring system. There are some other cloud-based solution like AWS Cloudwatch as well. Those would help service team to monitor the health of the service and also the business owners to make decisions based off of some business metrics.
  2. Today, this parking lot is simple and it assumes that all issues with-in parking lot will be addressed by security. In future, if we would like a fully automatic parking lot, we can deploy sensors on the parking lot and detect what's going on. For example, if a customer's vehicle is too large for a small parking space, they will get a notification and they shall swap the space or will be towed.