Design A Task Scheduler - System Design

System requirements

Functional:

Task creation - create/delete
Task execution - start/stop
Tasks orgnization - one task can be executed after the completion of previous task
Task prioritization - critical tasks can be executed ahead of less critical tasks
User notification - completion, failure, retrial, etc.
Latency configurable - User can define an acceptable latencies for the task execution.

Non-Functional:

Fault Tolerance. Implement mechanisms for seamless failover to handle master or worker node failures without service disruption
Scalibity. System should be able handle thousands of tasks being executed concurrently.
Reliability. System should be highly reliable to make the tasks executed as expected.

Capacity estimation

Task creation 1000/s
Task execution 10000/s
Concurrent tasks assume each task take 5 mins, in total 3 million tasks raunning concurrently
Average size of data that each task created is 1 kb, so it would be 10 MB data created per second and 10 * 60 * 60 * 24 = 86400MB = 864 GB per day
If we store data for 2 years, it would be 864 * 365 * 2 = 631 TB
Bandwidth usage: 10 MB * 8 * 3 = 240 Mbps

API design

Post /api/v1/user/login

Params userName/email, password

Return status, accessToken, refreshToken

Task creation

Post /api/v1/task/create

Params accessToken, taskName, taskDesc, maxRetries, userId, priority, associations (upstream or downstream of current task), payload, code to be executed (passed through page, could be a sql query or a code snippet)

Task execution

Get /api/v1/task/execute

Params accessToken, taskId, dateTime

Task cancelation

Get /api/v1/task/cancel

Params accessToken, taskId

Database design

Task metadata table

taskId, taskName, payload, maxRetries, priority, associations, taskDesc, codeSnippet, createTime.

Task execution table

taskId, startTime, stopTime, duration, status, errorMessage

Based on our requirements, there would be 3 million tasks executing concurrently, so I would prefer Non-relational database (Cassandra or MongoDB) over Relational database for our design

To futher speed up the processing time and ensure minimal latency, we can store the data into Cache such as Redis or Memcache.

We also introduced Elastic Search to basically storing the log data in terms of visualizing through Kibana, so we can easily check the logs and find out the issues.

High-level design

Show as diagram

Auth Service - Handling token generation

Scheduling Service - Receive and process task creation and execution requests, persist the metadata into database and publish the task to the Message Queue

Execution Service - Consume the tasks from Message queue, process it based on priorities and dependencies.

Notification Service - Notify users with exception logs If the task failed

Sync Service - Managing the real-time updates between database and cache

Monitor Service - Gather system metrics, performance metrics and reports to Grafana

Request flows

Authentication flow

User made a request -> Rate limiter check the limits -> forward to Auth Service if passed -> generate token and return

Task creation flow

User created a task -> Rate limiter check the limits -> Load balancer routes to Scheduling service if passed -> token is valid, persist the metadata into Database

Task execution flow

User started a task -> Rate limiter check the limits -> Load balancer routes to Scheduling service if passed -> token is valid, push the task to the queue, update the status to executing

Executor service take the task and put into a internal priority queue -> poll out the priority one and check dependencies -> depends on other task, check the status of the upper task -> start processing if upper task finished, otherwise make this task on hold and process the other one.

Detailed component design

1.At most once

We should set up the message queue to make the message consumed at most once, that means one same task would be executed only once by a same consumer.

2.Manage priorities

In one node: once message consumed by the Executor service, it would be sent into a priority queue. If available service would poll the message out and process it based on its priorities.

Multiple nodes: each service does not aware of the priorities for others, so their would be the possibilities that low priority tasks executed before the high priority one.

We can leverage Redis ZSet to make the tasks ordered by its priorities, Execution service would fetch the status from Redis before starting processing the tasks.

3.Manage dependencies

Before processing a task, Executor service would validate the dependencies first, and check whether the dependencies finished or not. If haven't yet finished, make the current task on hold, otherwise process the task.

Multiple Execution service would ensure the sequence of the execution by fetching the data from Redis

Trade offs/Tech choices

Consistency or Availability

Make sure a task executed successfully even if some error happens is more important than strong consistency, it's ok that some users might see the updates of tasks a bit later, so we prefer availability over consistency.

Failure scenarios/bottlenecks

1.Task execution failure

Due to consumer crashes - Failed tasks would be redelievered to the message queue such that other consumers can take it and consume again.

Due to logic error - Failed tasks would be retried certain times based on the configuration, if still failed, update the status and notify the user.

Future improvements

1.For any potential issues users can investigate through Kibana

2.We can build a powerful monitor service to be able to visualize the system metrics and aware of the potential issue in adavance