Airflow
Docker-Compose
Package Installation
Workflow Automation
DevOps

How to install packages in Airflow docker-compose?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Apache Airflow deployed via Docker Compose uses the official apache/airflow image, which includes core dependencies but not every Python package your DAGs may need. There are several ways to install additional packages: using a requirements.txt file with the _PIP_ADDITIONAL_REQUIREMENTS environment variable, building a custom Docker image, or mounting a requirements file. The best approach depends on whether you need quick iteration during development or reproducible builds for production.

Method 1: _PIP_ADDITIONAL_REQUIREMENTS (Quick Development)

yaml
1# docker-compose.yaml
2version: '3.8'
3x-airflow-common: &airflow-common
4  image: apache/airflow:2.8.1
5  environment: &airflow-common-env
6    AIRFLOW__CORE__EXECUTOR: LocalExecutor
7    AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
8    _PIP_ADDITIONAL_REQUIREMENTS: "pandas==2.1.4 scikit-learn==1.3.2 requests==2.31.0"
9
10services:
11  airflow-webserver:
12    <<: *airflow-common
13    command: webserver
14    ports:
15      - "8080:8080"
16
17  airflow-scheduler:
18    <<: *airflow-common
19    command: scheduler
20
21  postgres:
22    image: postgres:13
23    environment:
24      POSTGRES_USER: airflow
25      POSTGRES_PASSWORD: airflow
26      POSTGRES_DB: airflow

_PIP_ADDITIONAL_REQUIREMENTS is an environment variable recognized by the official Airflow Docker image's entrypoint script. It runs pip install on every container startup. This is convenient for development but slow for production because packages are reinstalled on every restart.

Method 2: Custom Dockerfile (Production)

dockerfile
1# Dockerfile
2FROM apache/airflow:2.8.1
3
4# Install system packages if needed (as root)
5USER root
6RUN apt-get update && apt-get install -y --no-install-recommends \
7    build-essential \
8    libpq-dev \
9    && apt-get clean \
10    && rm -rf /var/lib/apt/lists/*
11
12# Switch back to airflow user for pip installs
13USER airflow
14
15# Copy and install Python requirements
16COPY requirements.txt /requirements.txt
17RUN pip install --no-cache-dir -r /requirements.txt
txt
1# requirements.txt
2pandas==2.1.4
3scikit-learn==1.3.2
4apache-airflow-providers-amazon==8.13.0
5sqlalchemy==1.4.50
yaml
1# docker-compose.yaml
2version: '3.8'
3x-airflow-common: &airflow-common
4  build:
5    context: .
6    dockerfile: Dockerfile
7  environment: &airflow-common-env
8    AIRFLOW__CORE__EXECUTOR: LocalExecutor
9    AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow

Build a custom image with docker compose build. Packages are installed at build time, so container startups are fast and builds are reproducible. This is the recommended approach for production.

Method 3: Mounting a requirements.txt

yaml
1# docker-compose.yaml
2x-airflow-common: &airflow-common
3  image: apache/airflow:2.8.1
4  volumes:
5    - ./dags:/opt/airflow/dags
6    - ./requirements.txt:/opt/airflow/requirements.txt
7  environment:
8    _PIP_ADDITIONAL_REQUIREMENTS: ""
9  entrypoint: >
10    bash -c "pip install -r /opt/airflow/requirements.txt && airflow"

This approach mounts the requirements file and installs packages via a custom entrypoint. It is more maintainable than inline _PIP_ADDITIONAL_REQUIREMENTS but still reinstalls on every startup. Use it when you want a file-based approach without building a custom image.

Installing Airflow Provider Packages

dockerfile
1# Dockerfile — installing provider packages
2FROM apache/airflow:2.8.1
3
4USER airflow
5
6# Install specific provider packages
7RUN pip install --no-cache-dir \
8    apache-airflow-providers-amazon==8.13.0 \
9    apache-airflow-providers-google==10.12.0 \
10    apache-airflow-providers-slack==8.0.0
11
12# Or install with extras during the base install
13# FROM apache/airflow:2.8.1-python3.11
14# RUN pip install "apache-airflow[amazon,google,slack]==2.8.1"

Airflow provider packages (for AWS, GCP, Slack, etc.) add operators, hooks, and sensors. They are versioned separately from core Airflow and should be pinned to specific versions.

Installing System-Level Dependencies

dockerfile
1FROM apache/airflow:2.8.1
2
3# Some Python packages need system libraries
4# Examples: psycopg2 needs libpq-dev, Pillow needs libjpeg-dev
5USER root
6RUN apt-get update && apt-get install -y --no-install-recommends \
7    libpq-dev \
8    libjpeg-dev \
9    libxml2-dev \
10    libxslt-dev \
11    && apt-get clean \
12    && rm -rf /var/lib/apt/lists/*
13
14USER airflow
15COPY requirements.txt /requirements.txt
16RUN pip install --no-cache-dir -r /requirements.txt

Always switch to USER root for apt-get commands and back to USER airflow for pip install. The Airflow image runs as the airflow user by default for security.

Verifying Installed Packages

bash
1# Check packages in a running container
2docker compose exec airflow-webserver pip list | grep pandas
3
4# Run a Python import check
5docker compose exec airflow-scheduler python -c "import pandas; print(pandas.__version__)"
6
7# Check all installed packages
8docker compose exec airflow-webserver pip freeze > installed_packages.txt

Common Pitfalls

  • Using _PIP_ADDITIONAL_REQUIREMENTS in production: This environment variable runs pip install on every container startup, adding minutes to startup time and making builds non-reproducible (package versions may change between installs). Build a custom Docker image with a pinned requirements.txt for production.
  • Forgetting to install packages in all Airflow services: The webserver, scheduler, and worker all need the same packages. Using the x-airflow-common YAML anchor ensures all services share the same image and environment. If one service is missing the package, tasks fail on that specific component.
  • Installing packages as root instead of the airflow user: Running pip install as root installs packages to a different Python path than the airflow user sees. Always use USER airflow before pip install in the Dockerfile, and only use USER root for system-level apt-get commands.
  • Version conflicts with Airflow's pinned dependencies: Airflow pins specific versions of packages like SQLAlchemy, Flask, and Jinja2. Installing incompatible versions breaks Airflow. Use pip install --constraint or check Airflow's constraints file at https://raw.githubusercontent.com/apache/airflow/constraints-2.8.1/constraints-3.11.txt.
  • Not cleaning apt cache in Dockerfile: Leaving apt-get cache in the Docker image unnecessarily inflates image size. Always add && apt-get clean && rm -rf /var/lib/apt/lists/* to the same RUN layer as apt-get install.

Summary

  • For development: use _PIP_ADDITIONAL_REQUIREMENTS environment variable for quick package installation
  • For production: build a custom Docker image with a pinned requirements.txt
  • Install system dependencies as root, Python packages as airflow user
  • Use x-airflow-common YAML anchors to ensure all services get the same packages
  • Pin package versions and respect Airflow's dependency constraints
  • Verify installation with docker compose exec ... pip list or Python import checks

Course illustration
Course illustration

All Rights Reserved.