How to auto-scale Kubernetes Pods based on number of tasks in celery task queue?
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
If Celery workers are running in Kubernetes, scaling them by CPU alone often misses the real signal: queue backlog. The practical solution is to expose queue length as a metric and let Kubernetes scale worker replicas from that metric, with KEDA usually being the cleanest implementation path.
Why CPU-Based HPA Is Often the Wrong Signal
A Celery worker can be idle on CPU while a backlog is building in the broker, or it can be busy for reasons unrelated to throughput. What you really care about is usually one of these:
- number of pending tasks
- tasks per worker
- queue latency or age of oldest task
That makes queue depth a better autoscaling input than pure CPU for many Celery systems.
The Easiest Modern Approach: KEDA
KEDA sits on top of Kubernetes and creates autoscaling behavior from event sources such as queues and brokers. For Celery deployments backed by RabbitMQ or Redis, that is often simpler than building a full custom-metrics pipeline yourself.
A common pattern is:
- run Celery workers in a
Deployment - install KEDA in the cluster
- configure a
ScaledObjectthat watches the broker or queue - let KEDA drive an HPA behind the scenes
Example with RabbitMQ Queue Length
This example shows the shape of the solution. The exact metadata keys depend on the broker and KEDA scaler you use, but the structure is the same.
The threshold value: "50" means, in effect, scale out when the backlog per scaling unit exceeds that target.
If You Cannot Use KEDA
The alternative is exposing custom metrics through Prometheus and a metrics adapter, then pointing a standard HPA at those custom metrics.
That can work well, but it is more moving pieces:
- exporter for broker or queue metrics
- Prometheus scraping
- custom-metrics adapter
- HPA config using the exposed metric name
If your only need is queue-based scaling, KEDA is usually less operationally noisy.
Tune for Throughput, Not Just Queue Count
A queue length target only works if it matches the worker's actual processing capacity.
For example, if one worker reliably drains about 20 tasks per minute and your latency target is loose, a backlog threshold of 50 might be fine.
If tasks are long-running or highly variable, you may want to scale based on a lower threshold or include task age as a better signal.
The autoscaler is only as good as the metric's relationship to your service-level objective.
Worker Concurrency Still Matters
Pod count is not the only scaling dimension. Celery also has worker-level concurrency settings. A deployment with two pods and a concurrency of 8 behaves differently from eight pods with concurrency 2, even if the total concurrency is similar.
Choose pod count and worker concurrency together based on:
- CPU and memory profile of tasks
- broker pressure
- task isolation needs
- startup time and scale-out speed
Common Pitfalls
Scaling only on CPU is the most common mismatch for Celery workloads.
Setting the queue threshold without measuring real worker throughput often causes either over-scaling or persistent backlog.
Ignoring worker startup time also hurts. If pods take too long to become ready, the autoscaler may react too slowly to bursts.
Finally, do not forget the broker. If RabbitMQ or Redis becomes the bottleneck, adding more workers will not solve the real problem.
Summary
- queue depth is usually a better autoscaling signal for Celery workers than CPU alone
- KEDA is often the simplest way to scale Kubernetes workers from queue backlog
- configure scaling around the actual broker and queue you use
- tune thresholds using real throughput and latency targets, not guesswork
- pod count, Celery concurrency, and broker capacity all need to be considered together

