Design A Task Scheduler - System Design

System requirements

Functional:

Task creation - create/delete
Task execution - start/stop
Tasks orgnization - one task can be executed after the completion of previous task
Task prioritization - critical tasks can be executed ahead of less critical tasks
Handling recurring tasks
User notification - completion, failure, retrial, etc.
Latency configurable - User can define an acceptable latencies for the task execution.

Non-Functional:

Fault Tolerance. Implement mechanisms for seamless failover to handle master or worker node failures without service disruption
Scalibity. System should be able handle thousands of tasks being executed concurrently.
Reliability. System should be highly reliable to make the tasks executed as expected.

Capacity estimation

Task creation 10/s
Task execution 10000/s
Concurrent tasks assume each task take 5 mins, in total 3 million tasks running concurrently
Average size of data that each task created is 1 kb, so it would be 10 KB data created per second and 10 * 60 * 60 * 24 = 864000 KB = 864 MB per day
If we store data for 2 years, it would be 864 * 365 * 2 = 631 GB
Bandwidth usage: 10 KB * 8 * 3 = 240 Kbps

API design

Post /api/v1/user/login

Params userName/email, password

Return status, accessToken, refreshToken

Task creation

Post /api/v1/task/create

Params accessToken, taskName, taskDesc, maxRetries, userId, priority, associations (upstream or downstream of current task), payload, code to be executed (passed through page, could be a sql query or a code snippet)

Task execution

Get /api/v1/task/execute

Params accessToken, taskId, dateTime

Task cancelation

Get /api/v1/task/cancel

Params accessToken, taskId

Database design

Task metadata table

taskId, taskName, payload, maxRetries, priority, associations, taskDesc, codeSnippet, createTime.

Task execution table

taskId, startTime, stopTime, duration, status, errorMessage

Relational database is good for managing task dependencis, and also natively support optimitic locking mechnicm to prevent race conditions, so we'll choose Relational database for our design.

To futher speed up the processing time, ensure minimal latency and reduce the pressure for the database, we can store the data into Cache such as Redis or Memcache.

We also introduced Elastic Search to basically storing the log data in terms of visualizing through Kibana, so we can easily check the logs and find out the issues.

High-level design

Show as diagram

Auth Service - Handling token generation

Scheduling Service - Receive and process task creation and execution requests, persist the metadata into database and publish the task to the Message Queue

Execution Service - Consume the tasks from Message queue, process it based on priorities and dependencies.

Notification Service - Notify users with exception logs If the task failed

Canal - Managing the real-time updates between database and cache

Monitor Service - Gather system metrics, performance metrics and reports to Grafana

Grafana - Visualize system metrics

Elastic Search - Stroing log info

Kibana - Visualize log info

Request flows

Authentication flow

User made a request -> Rate limiter check the limits -> forward to Auth Service if passed -> generate token and return

Task creation flow

User created a task -> Rate limiter check the limits -> Load balancer routes to Scheduling service if passed -> token is valid, persist the metadata into Database

Task execution flow

User started a task -> Rate limiter check the limits -> Load balancer routes to Scheduling service if passed -> token is valid, push the task to the queue, update the status to executing

Executor service take the task and put into a internal priority queue -> poll out the priority one and check dependencies -> depends on other task, check the status of the upper task -> start processing if upper task finished, otherwise make this task on hold and process the other one.

Detailed component design

1.Handle 3 millions tasks running concurrently

All of our backend services would be highly scalable and deployed multiple instance in terms of handling large numbers of tasks.

Scheduling Service and Executor service would fetch the task info directly from Cache to get the low latency and avoid overwhelming the database.

Cache would be also distributed deploy such as Redis Cluster to handle millions of requests.

We've adopt in Relational database in our design, if the system encountering the bottleneck due to the performance of database, we can consider sharding the database based on task id.

2.Tasks executed at most once

We should set up the message queue to make the message consumed at most once, that means one same task would be executed only once by a same consumer.

3.Manage priorities

In one node: once message consumed by the Executor service, it would be sent into a priority queue. If available service would poll the message out and process it based on its priorities.

Multiple nodes: each service does not aware of the priorities for others, so their would be the possibilities that low priority tasks executed before the high priority one.

We can leverage Redis ZSet to make the tasks ordered by its priorities, Execution service would fetch the status from Redis before starting processing the tasks.

4.Manage dependencies

Before processing a task, Executor service would validate the dependencies first, and check whether the dependencies finished or not. If haven't yet finished, make the current task on hold, otherwise process the task.

Multiple Execution service would ensure the sequence of the execution by fetching the data from Redis

5.Handle recurring tasks

When creating a task, user can config a cron expression and the backend system would ensure the tasks effective in the scheduled time.

Trade offs/Tech choices

Consistency or Availability

Make sure a task executed successfully even if some error happens is more important than strong consistency, it's ok that some users might see the updates of tasks a bit later, so we prefer availability over consistency.

Failure scenarios/bottlenecks

1.Task execution failure

Due to consumer crashes - Failed tasks would be redelievered to the message queue such that other consumers can take it and consume again.

Due to logic error - Failed tasks would be retried certain times based on the configuration, if still failed, update the status and notify the user.

2.Backend Service crashs

We've designed a highly scalable system such that all back end services would not only deploy just one instance, if any service crashs, it would be evited from the cluster temporily and other instance will take over his job.

3.Cache not available

To avoid SPOF of cache we have to make it highly available, we can deploy as Redis Cluster to achieve that

4.Database not available

We can use Single-leader replication of databases to achieve high scalability and high performance of reading, if the write performance still be the bottleneck of our system, we can consider sharding base on taskId

Future improvements

1.For any potential issues users can investigate through Kibana

2.We can build a powerful monitor service to be able to visualize the system metrics and aware of the potential issue in adavance