job scheduling
simultaneous processes
resource management
process limitation
task concurrency

How to limit the amount of simultaneously running jobs of a certain type?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

If only certain job types need concurrency limits, the usual design is a queue plus a per-type semaphore or token counter. That way, jobs of type A can be capped independently from jobs of type B, instead of applying one global limit to the whole system.

Model the Limit Per Job Type

Suppose you have jobs such as video, report, and import. If video jobs are expensive, you might allow only two of them to run at the same time while other job types have different limits.

In Python with asyncio, that looks like this:

python
1import asyncio
2from collections import defaultdict
3
4limits = {
5    "video": 2,
6    "report": 5,
7}
8
9semaphores = {job_type: asyncio.Semaphore(limit) for job_type, limit in limits.items()}
10
11def get_semaphore(job_type: str) -> asyncio.Semaphore:
12    return semaphores.setdefault(job_type, asyncio.Semaphore(1))

The semaphore acts as the concurrency gate for each category.

Run Jobs Through the Gate

Each job acquires the semaphore for its own type before starting work:

python
1async def run_job(job_type: str, job_id: int) -> None:
2    sem = get_semaphore(job_type)
3
4    async with sem:
5        print(f"start {job_type} {job_id}")
6        await asyncio.sleep(1)
7        print(f"end {job_type} {job_id}")

If ten video jobs arrive, only two run immediately. The rest wait. Meanwhile, report jobs can still use their own separate pool.

Keep Scheduling and Execution Separate

A good architecture separates:

  • job admission
  • per-type concurrency control
  • actual job execution

That makes it easier to change limits later without rewriting the whole runner. The scheduler decides what enters the system, while the concurrency gate decides how many jobs of each type may execute simultaneously.

Metrics Matter Too

If you enforce per-type limits, expose queue length and running-count metrics per type. Otherwise you know a limit exists but cannot tell whether the system is healthy or just backlogged.

Use the Same Pattern Outside asyncio

The idea is not specific to Python. In other environments you would use:

  • a semaphore per type
  • a worker pool per type
  • a scheduler with type-based concurrency slots

The principle stays the same: the limit is attached to the job category, not just to the whole process.

Decide What Happens to Excess Jobs

Once the limit is reached, extra jobs need a policy. Common options are:

  • wait in a queue
  • reject immediately
  • retry later
  • move to a lower-priority queue

The right choice depends on whether backlog is acceptable. For background processing, queueing is common. For user-triggered tasks, rejection or fast retry may be better.

Queue Growth Should Influence the Limit

If the queue for one job type keeps growing while worker slots stay full, that is a sign the limit may be too low or the underlying job cost is too high. Concurrency limits are policy settings, not values you choose once and ignore forever.

Common Pitfalls

  • Applying one global concurrency limit when the real problem is only one expensive job type.
  • Using per-type limits without defining what happens when the queue grows too large.
  • Letting one noisy job type monopolize workers that should be reserved for other work.
  • Mixing scheduling logic and execution logic so concurrency rules become hard to change.
  • Forgetting to release the slot when a job fails or is cancelled.

Summary

  • Limit concurrency by job type, not only globally, when different workloads have different cost profiles.
  • A per-type semaphore or token pool is a simple way to implement that limit.
  • Queue, reject, or retry excess jobs based on product needs.
  • Keep scheduling separate from execution so concurrency policy stays maintainable.
  • Always release the slot correctly, even on failure paths.

Course illustration
Course illustration