Codemia | Master System Design Interviews Through Active Practice

My Solution for Design a Meeting Calendar System with Score: 8/10

by redzrdi

System requirements

Functional:

A user should be able to manage lifecycle( create , update , delete ) of meetings.
A user should be able to see all meeting instances for any given user within a given time window . This information will enable to create a timeline view for a user that will be useful for meeting scheduling and also to create personal calendar view for user.
A user should be able to create recurring meetings using this system . For recurring meetings , the meeting creator needs to provide normal scheduling information for the first occurrence and additionally a repeat pattern ( fixed num of days , custom patterns ) and either a end occurrence schedule or a end condition of number of occurrences etc can be mentioned .
The meeting creator should be able to reschedule or cancel a specific occurrence of a recurring meeting series.
It integrates with other external system - like the User directory system , Email and other resource management system service to provide an integrated experience to user .
A UI component to present a timeline view to user of all participants and resources are in scope . This UI may accessed standalone or embedded within another applicaiton like a email client.
Following persona are to be supported :-
1. *User* - a user is a valid user of the organization as per ther User directory . The user can further act in the role of an *Event Owner* or *Participant* .
2. *Admin* - Admin user has exclusive access to the admin api s through which resources( e.g meeting rooms etc) can be managed . However the resource management may be offloaded to an external system to which the system integrates.
3. *System User* - System User persona is used for integrations with other systems like the Email , User Directory and the Resource Management systems.

Meeting participants should be able to see participant responses to a invite.

The system uses a RBAC base authorization . It assumes these roles to be defined within the the same user directory that is leveraged for its normal usage .
For authentication , a OAuth based system is assumed that is integrated with the user directory system.
The system is here assumed to be a single tenant system

Non-Functional:

Strong Consistency - A meeting booking if successful should immediately be visible to everybody in all parts of the system such that double bookings are impossible in the system.
High Availability - Service availability 99.99% availability is assumed.
Resiliency - As this system integrates with multiple systems and a few of them like the Email or User Directory being critical infra component - it has to implement resiliency best practices to account for their downtime and also not to overload them .
Scalability - The system should support a generally high read traffic ( as a lot of user will have the calendar view open ) and spikes of read // write volume during the typical work day .

Capacity estimation

Assuming a medium size organization of 1000 people and 30% of online users that are trying to book concurrently . We have to support upto 350 tps of write traffic during the peak hours of the day .

For write traffic , assuming a 100 ms p99 response time we will have 10 tps/thread . Assuming hyperthreading machine - there can be about 10 thread per core. Thus 100 tps/core is achievable .

Considering aws ec2 instance type starting size of "large" machine ( m5.large with 2 vCPU , 8 gb RAM ) . we can get 200 tps / instance .

Thus the write instances have to be load balanced to 2 instances .

For High Availability setup - considering a aws deployment on ec2 instances , wherein ec2 instances are assumed to have 99.5% availability and a single AZ provides 99.9% availability - 4 m5.large ec2 instances across 2 AZ s ( 2 in each AZ ) . Additionally DNS should have a weighted round robin routing logic to evenly distribute the load . The should ALB at each zone to further distribute the incoming traffic.

The read traffic could be higher than this ( based on the implementation of UI ) as there could be continuous polls from open clients . We may assume a consistent traffic of about 500 tps of read traffic.

Read instances aim to serve 80% of the traffic using in memory cache . The read api with in memory cache read would be in about 5 ms . Thus it can do 200 tps / thread => 2000 tps/core => 4000 tps / ec2 instance (large size) . This can further be optimized , the limiting factor here is the memory size of the data to be cached . For the sake of HA we maintain 3 ec2 instance in each AZ and 2 AZ deployment with multi master configuration . For improving data locality implement a hash based routing algo on the loadbalancer

API design

Create Meeting -- POST /meetings
Update Meeting -- PUT /meetings/<meetingID>
Delete Meeting -- DELETE /meetings/<meetingID>
Get Meeting -- GET /meetings/<meetingID>
List time slots for User -- GET /timeSlots?participant=uid1&from=<date-time(ISO date time TZ format)>&to=<date-time>&participation=<CONFIRMED|ALL>&nextPageToken=<tokenStr>
RSVP Participation( rsvp for a meeting , all occurrences . Also supports rsvp of specific timeslots within the meeting ) -- POST /meetings/<meetingId>/rsvp
Create Meeting Room -- POST /resources
Update Meeting Room -- PUT /resources/<resourceId>
Delete Meeting Rooms -- DELETE /resources/<resourceId>
List Meeting Rooms -- GET /resources?resourceType=meeting_room&min_capacity=2&max_capacity=2

Database design

Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...

High-level design

*Calendar Write Service* - This is a stateless service that can be elastically scaled out to maintain availability and scalability . This microservice has multi AZ deployment to ensure the required availability.

*DB master* - A relational db like postgresql is used in multi master configuration . The master instances are deployed in multiple AZ s .

*Calendar Read Service* - This service provides the most highly used api i.e the get Timeslots for a given user for a given time period . This information will be used by clients to show the participants timeline view in meeting scheduling screen and a user's own calendar view .

*Ingress* - Service discovery is faciliated by DNS and the Ingress Load Balancer ( from the cloud provider ) . The load balancer ensure traffic to be routed to instances in multiple AZ to guarantee high availability.

Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...

Detailed component design

*Resource Manager* - This microservice is responsible for managing the lifecycle of meeting resources like meeting room . It also has the logic to automatically accept/decline a meeting request . Resource manager maintains an in memory interval tree per meeting room that will enable it to quickly decide if a requested timeslot is available or not .

Trade offs/Tech choices

Relational Db like postgresql is chosen instead of a NoSql db like cassandra or dynamodb , to ensure stong consistency even at the cost of more difficult scalability or availability .

Failure scenarios/bottlenecks

There are critical external dependencies - like "User directory System" and "Oauth Server" . User directory service is required for user validation in the write path . The availability of the write api s is directly dependant on this service .

*Mitigation* - A local cache of the User entities can be built within the user gateway to be able to locally validate participants even if the external system is down .

*OAuth Server* - is a critical infra component , a mitigation for unavailability of this is not possible as such an approach will compromise security.

There is dependency on external Email Service for notifying participants . Although the system remains usable even if Email Service is down for some time , the user experience is degraded ( the users will not get a email notification , but they can check their calendar view and get meeting details through the calendar UI or other intgration points ) .

*Mitigation* - A full mitigation is not possible . However an async integration ( with retry policy ) ensures that the notifications are not lost and they are eventually delivered.

Future improvements

Intelligent Scheduling assistant - This microservice can provide recommendation for the conflict free timeslot for the given set of participants and resources , while a user is trying to create a new meeting .
Integration with object storage service for enabling attachments to a invite.