System requirements
Functional:
- Schedule tasks to execute
- Single execution jobs
- CRON jobs
Non-Functional:
- High Availability
- Available
Capacity estimation
- Small script -> A few KB
- Medium script -> Hundreds of KB
- Large script -> A few MB
Task Table:
- Time to execute -> 4KB
- UserID -> 4KB
- Status -> 10KB
~30KB per row
User Table:
- First Name -> 10KB
- Last Name -> 10KB
- UserID -> 4KB
~25KB per row
On average, 5 to 20 tasks per user per day
100,000 users per day
2000000 * 30 = 60000000KB a day -> 60GB per day inserted into the user Table
API design
External:
- POST /sendTask
- Takes in script name
- Takes in time it should execute
Database design
Task Table:
- Time to execute -> 4KB
- UserID -> 4KB
- Status -> 10KB
- Variables -> 20KB
User Table:
- First Name -> 10KB
- Last Name -> 10KB
- UserID -> 4KB
High-level design
Client sends task to POST endpoint
- Add task to Tasks table
Read tasks from DB -> find tasks that need to be executes -> Tasks where execute time has passed -> Normalize time by 10 seconds
Because the system will be read heavy, we would want to have replicas to reduce the load in the primary database
Shard the database -> shard databases by task time -> Further shard the shards that have overwhelming amount of tasks
Find tasks that need to be executed
Enqueue the tasks by priority
If task fails, task priority increases
Higher the priority, the stronger executor node the script is sent to
Keep track of it executor nodes are available by heartbeats -> use zooKeeper to track it
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?