How to limit the amount of simultaneously running jobs of a certain type?
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
If only certain job types need concurrency limits, the usual design is a queue plus a per-type semaphore or token counter. That way, jobs of type A can be capped independently from jobs of type B, instead of applying one global limit to the whole system.
Model the Limit Per Job Type
Suppose you have jobs such as video, report, and import. If video jobs are expensive, you might allow only two of them to run at the same time while other job types have different limits.
In Python with asyncio, that looks like this:
The semaphore acts as the concurrency gate for each category.
Run Jobs Through the Gate
Each job acquires the semaphore for its own type before starting work:
If ten video jobs arrive, only two run immediately. The rest wait. Meanwhile, report jobs can still use their own separate pool.
Keep Scheduling and Execution Separate
A good architecture separates:
- job admission
- per-type concurrency control
- actual job execution
That makes it easier to change limits later without rewriting the whole runner. The scheduler decides what enters the system, while the concurrency gate decides how many jobs of each type may execute simultaneously.
Metrics Matter Too
If you enforce per-type limits, expose queue length and running-count metrics per type. Otherwise you know a limit exists but cannot tell whether the system is healthy or just backlogged.
Use the Same Pattern Outside asyncio
The idea is not specific to Python. In other environments you would use:
- a semaphore per type
- a worker pool per type
- a scheduler with type-based concurrency slots
The principle stays the same: the limit is attached to the job category, not just to the whole process.
Decide What Happens to Excess Jobs
Once the limit is reached, extra jobs need a policy. Common options are:
- wait in a queue
- reject immediately
- retry later
- move to a lower-priority queue
The right choice depends on whether backlog is acceptable. For background processing, queueing is common. For user-triggered tasks, rejection or fast retry may be better.
Queue Growth Should Influence the Limit
If the queue for one job type keeps growing while worker slots stay full, that is a sign the limit may be too low or the underlying job cost is too high. Concurrency limits are policy settings, not values you choose once and ignore forever.
Common Pitfalls
- Applying one global concurrency limit when the real problem is only one expensive job type.
- Using per-type limits without defining what happens when the queue grows too large.
- Letting one noisy job type monopolize workers that should be reserved for other work.
- Mixing scheduling logic and execution logic so concurrency rules become hard to change.
- Forgetting to release the slot when a job fails or is cancelled.
Summary
- Limit concurrency by job type, not only globally, when different workloads have different cost profiles.
- A per-type semaphore or token pool is a simple way to implement that limit.
- Queue, reject, or retry excess jobs based on product needs.
- Keep scheduling separate from execution so concurrency policy stays maintainable.
- Always release the slot correctly, even on failure paths.

