My numpy build doesn't use multiple CPU cores
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
NumPy itself is not a general-purpose multithreading engine. Some operations use multiple CPU cores through linked BLAS or LAPACK libraries such as OpenBLAS or MKL, but many ordinary NumPy array operations are still single-threaded. So when a NumPy build appears to use only one core, the first question is not "is NumPy broken". It is "which operation am I measuring and which math library is actually underneath it".
Not Every NumPy Operation Is Parallel
This is the biggest misconception. Matrix multiplication and some linear algebra calls may use multiple threads through BLAS. Element-wise operations such as simple additions often do not.
For example:
If you benchmark x + y and see one core doing most of the work, that does not prove your NumPy build is wrong.
Check Which BLAS Library NumPy Uses
The easiest first diagnostic is:
Look for references to libraries such as:
- OpenBLAS
- MKL
- BLIS
- Accelerate on macOS
If NumPy is linked against a single-threaded or minimal backend, then matrix-heavy operations will not scale the way you expect.
Test a BLAS-Heavy Operation
To see whether the build can use multiple cores, test a large matrix multiplication rather than a simple element-wise expression.
While this runs, observe CPU usage with your system tools. If the BLAS backend is multithreaded and configured to use several threads, you should typically see multiple cores active.
Thread Count May Be Limited by Environment Variables
Even when NumPy is linked against a multithreaded BLAS, the number of threads can be capped by environment variables.
Common ones include:
- '
OMP_NUM_THREADS' - '
OPENBLAS_NUM_THREADS' - '
MKL_NUM_THREADS' - '
VECLIB_MAXIMUM_THREADSon some macOS setups'
Example:
Or for MKL:
If one of these is set to 1, the build may be behaving exactly as configured.
Use threadpoolctl to Inspect Runtime Thread Pools
A practical Python-side tool is threadpoolctl, which can reveal the thread pool libraries loaded into the process.
This helps answer questions such as:
- which numeric backend is loaded
- how many threads it is configured to use
- whether OpenBLAS or MKL is actually present
That is often more informative than guessing from package names alone.
Building from Source Is Usually Not the First Fix
People often assume they need to rebuild NumPy manually. Usually they do not. In many environments, the simpler solution is to install a distribution already linked to a good BLAS implementation.
Examples include:
- conda packages that ship with MKL or OpenBLAS
- wheel builds that already link the intended backend
Rebuilding from source only makes sense when you have a specific reason and understand which numeric backend you want to link.
Common Pitfalls
- Expecting all NumPy operations to scale across cores the way matrix multiplication often does.
- Benchmarking element-wise array math and concluding that BLAS threading is broken.
- Forgetting that environment variables may limit BLAS thread count to one.
- Guessing about the backend instead of checking
np.show_config()orthreadpoolctl. - Rebuilding NumPy before verifying whether the slow operation is even one that should be multithreaded.
Summary
- NumPy does not automatically multithread every operation.
- Multi-core behavior usually depends on the linked BLAS or LAPACK backend.
- Check the backend with
np.show_config()and runtime thread pools withthreadpoolctl. - Benchmark a BLAS-heavy operation such as matrix multiplication, not just element-wise math.
- Verify thread-limit environment variables before assuming the build is wrong.

