Codemia | Master System Design Interviews Through Active Practice

My Solution for Design a Meeting Calendar System with Score: 9/10

by utopia4715

System requirements

Functional:

User can view coworkers' calendars for next 1 year.
User can schedule a meeting with coworkers and external users.
Scheduling a meeting will send email invite to all the participants.
I assume a user interacts with this system through a browser.

Non-Functional:

Available
Response time is reasonably fast, such as < 500ms for each action taken.
Scalable
Consistency is important.

Capacity estimation

100M DAUs
Assuming each user schedules two meetings a day, and peak load is twice higher than average load,
At each second, 100000000 / (24 * 60 * 60) * 2 * 2 = 4630 users would be booking meetings at each second.

API design

view_schedules(user_id, employee_ids [], from, to)
user_id is the user requesting this information.
employee_ids is a list of employees whose schedule the user wants to see.
from and to define the time range.
This returns JSON object. It's an an array of employees. Each employee object has a start-time sorted list of meetings.
[employee_id: [meeting_name:, start_time:, end_time: ]]
Errors: it may fail to query information altogether, e.g., if the user does not have access rights. In this case, 4xx error is returned.
It may partially fail, e.g., it can only return schedules of some employees but not others. In this case, 200 success code is returned, but the result would include an empty list and error explanation for the employee.
schedule_meeting(user_id, from, to, name, description, invitees)
This schedule a meeting and send invites.
Invitees can include employees and external users. Employees are specified by employee IDs. External users are specified by email addresses.
If scheduling a meeting fails (e.g. internal server error), appropriate error code, such as 5xx, is returned.
Upon success, this returns a meeting ID and a meeting status (e.g. meeting has been created).
check_status(user_id, meeting_id)
This returns the status of meeting, e.g., invites sent, invites accepted
cancel_meeting(user_id, meeting_id, notify_invitees = True)
This cancels a meeting. If notify_invitees is True, it would send a cancelation notice to invitees.

Database design

There would be 200M new meeting a day.

Each meeting needs data such as name, description, list of invitees, status. I would estimate it to take 2KB / meeting.

That's 400GB of data every day.

In 2 years, it would require 400 * 365 * 2 / 1000 = 292 TB of data.

I think a relational DB is a good choice for storing meeting data because:

Strong consistency is important. Events such as meetings scheduled, canceled, user accepted, must be reflected on every user's view.
This is one of those apps that people depend upon to be always correct. For example, If you show up to a meeting in which nobody else came, because the meeting was canceled but the cancelation message didn't reach you, you would be quite upset . Eventual consistency would not be desirable. We would prefer RDB than NoSQL DBs.
The scalability would be a challenge for RDB. However, there are several mitigations we can take:
Most meetings are scheduled within a couple of weeks. Once the meeting is completed, the information can be moved to a secondary storage.
Most meetings are scheduled within an organization (e.g. a company). We can partitioning data by organization.
NoSQL DBs (e.g. doc based, LSM based) would have higher scalability, but RDB is advantageous in terms of consistency and relational queries.

High-level design

See the diagram.

Request flows

Client send all the requests to API Gateway.

I am introducing only one service, Meeting Service, because all the queries are related to the core functionality: CRUD for meeting. Therefore, all requests are handled by Meeting Service.

Meeting Service executes the CRUD operations (querying schedules, creating meeting, updating meeting ...) primarily using Database.

When Meeting Service has to send emails (e.g. to notify invitees), it would create a message in Message Queue (e.g. Kafka).

Notification Service pulls messages from the queue, and sends email notifications to the invitees.

Meeting Service uses cache (e.g. Redis) to store frequently accessed information, such as user metadata or meeting details.

Detailed component design

schedule_meeting() would create an entry in RDB:

Meeting:

meeting_id
org_id
from // in minutes
to // in minutes
name
description
status // scheduled, ongoing, finished

Meeting_Invitee:

meeting_id
org_id
type (employee or external contact)
employee_id
email
status // invited, accepted, rejected

Meeting_Invitee table joins Meeting table and Employee table.

This data model helps following steps, such as view_schedules() call.

To query schedules of employees, Meeting Service looks up employees in Meeting_Invitee table. Based on that, look up Meeting table, filtered by time.

An optimization may be necessary to improve this look up performance. In this data model, the service would have to look up all the meetings for a particular employee. This would be too much. We can add additional information, such as date (or week or month), in the Meeting_Invitee table. The service can use this information to filter meetings.

Trade offs/Tech choices

Sharding would be important as the use case scales.

org_id (Organization ID) would be a good choice as a sharding key. Most meetings are scheduled within one organization, including mostly employees of the organization. By sharding the data with org_id, accesses would be heavily localized, making caching effective.

Some organizations would be bigger, or more active, than others. As such, we need to carefully plan which organizations are supported by which DB nodes. An idea of consistent hashing - dynamically adjusting the responsibility of each node - can be applied to RDBs, but are harder to implement than some NoSQL DBs. As such, we should carefully project the workload of each organization and assign it to an appropriate DB node.

Failure scenarios/bottlenecks

Most major components, e.g., Meeting Service, Notification Service, Cache, Message Queue, are either stateless or support horizontal scaling. We should take advantage of such feature for fault tolerance (e.g. when one node goes down, another can take over) and scalability. We should monitor these nodes for service latency and resources. Auto-scaling can be applied to some components, e.g., Meeting Service.

Database is a little more tricky as RDB is not horizontally scalable.

Primary-Secondary configuration (with read only replicas) can be used to provide fault tolerance and scalability. As discussed earlier, sharding also provides scalability.

Future improvements

Meetings have this important property: once it is finished, the data will be less important. Things like descriptions, names, or invitees, wouldn't change.

This presents an opportunity for optimization in the Database. We can have a service that periodically checks finished meetings, and move that data out of the main DB to a secondary storage. The history of past meetings could be important, so we would store it in a less expensive, less performant storage.

This would allow us to keep the main DB smaller, improving scalability and fault tolerance.