What batch_size and pre_dispatch in joblib exactly mean
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
In joblib.Parallel, batch_size and pre_dispatch control how work is packaged and fed to workers. They do not change what your function computes. They change the scheduling behavior around it. Understanding those two knobs helps when parallel code is slower than expected, uses too much memory, or leaves workers idle.
What batch_size Means
batch_size is the number of individual tasks bundled together and sent to a worker as one scheduling unit.
Suppose you do this:
Here the tasks are logically square(0), square(1), and so on. With batch_size=3, Joblib does not necessarily send one task at a time. It sends them in chunks of three where possible.
That matters because dispatching work has overhead. If each task is tiny, sending one task at a time can waste time on scheduling rather than computation. Batching reduces that overhead.
Small Versus Large Batches
Small batches are useful when:
- tasks vary a lot in runtime
- you want better load balancing
- each task is already expensive enough that scheduling overhead is negligible
Large batches are useful when:
- tasks are extremely fast
- overhead dominates runtime
- task durations are fairly uniform
If batches are too large, one worker can end up holding a long chunk of work while others finish earlier and wait. If batches are too small, the scheduler spends too much effort handing out tiny pieces of work.
What batch_size="auto" Does
The default batch_size="auto" lets Joblib adjust batching heuristically based on observed execution behavior. The intent is to avoid you having to hand-tune the chunk size for ordinary workloads.
That means auto is often a good starting point. You usually reach for a manual batch size only when profiling shows that the default behavior is leaving performance on the table.
What pre_dispatch Means
pre_dispatch controls how much work Joblib queues ahead of time before or while workers are running.
The default is commonly:
If n_jobs=4, that means Joblib initially dispatches enough work for eight scheduling units. A scheduling unit here means tasks as defined after batching, not necessarily raw individual function calls.
Example:
This setting tries to keep workers busy without materializing the entire workload too aggressively up front.
Why pre_dispatch Exists
If Joblib dispatched too little work, workers could become idle waiting for the main process to produce the next batch. If it dispatched too much work, memory usage could grow unnecessarily, especially when the task iterable is large or the arguments are heavy.
So pre_dispatch is a queue-depth control.
Lower values can help when:
- the input iterable is huge
- preparing tasks uses significant memory
- you want tighter control over how much work is in flight
Higher values can help when:
- tasks are cheap
- you want to reduce worker starvation
- task production is itself somewhat slow
How They Interact
The key interaction is this:
- '
batch_sizedetermines how many raw tasks belong to one dispatch chunk' - '
pre_dispatchdetermines how many of those chunks are queued ahead of time'
So if you raise both, you may reduce scheduling overhead but also increase memory usage and reduce balancing flexibility.
A simple mental model is:
Practical Tuning Advice
If each task takes seconds, leave batch_size small or automatic. Scheduling overhead is not the bottleneck.
If each task takes microseconds or milliseconds, larger batches may help a lot.
If memory spikes or the task generator is huge, reduce pre_dispatch.
A small experiment often tells you more than guesswork:
Measure wall-clock time and memory, then choose based on evidence.
Common Pitfalls
The most common mistake is assuming batch_size changes algorithmic parallelism. It only changes how tasks are grouped for scheduling.
Another mistake is forgetting that pre_dispatch applies to batches, not necessarily single function calls.
Developers also often increase both values at once and then wonder why memory usage grows or workers become unbalanced.
Finally, if tasks are very expensive, tuning these parameters usually matters much less than algorithm choice, I/O behavior, or backend selection.
Summary
- '
batch_sizeis how many tasks Joblib groups into one dispatch chunk.' - '
pre_dispatchis how many chunks Joblib queues ahead of time.' - Larger batches reduce scheduling overhead but can hurt load balancing.
- Higher pre-dispatch values keep workers busy but can increase memory usage.
- Start with the defaults and tune only when profiling shows a real scheduling problem.

