Docker Swarm
Kafka Cluster
Node Configuration
Cloud Computing
Cluster Building

Build a multi node Kafka cluster on docker swarm

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Apache Kafka is an open-source stream-processing software platform developed by Linkedin and donated to the Apache Software Foundation, written in Scala and Java. Kafka aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. Its storage layer is essentially a "massively scalable pub/sub message queue designed as a distributed transaction log," making it highly valuable for enterprise infrastructures to process streaming data. Docker Swarm is a container orchestration tool, meaning that it allows the user to manage multiple containers deployed across multiple host machines.

Setting Up a Multi-Node Kafka Cluster on Docker Swarm

Prerequisites:

  • Docker Engine: Ensure Docker is installed on all the machines intended to be used in the Swarm.
  • Docker Compose: Required to define and run multi-container Docker applications.

Steps to Set Up Docker Swarm:

  1. Initialize Swarm Mode: First, designate one of your machines as the manager node. Initialize the swarm mode on this node by running:
 
   docker swarm init --advertise-addr <MANAGER-IP>
  1. Add Worker Nodes: On the manager node, retrieve the join token:
 
   docker swarm join-token worker

Execute the displayed command on each machine you want to join as a worker to the swarm.

Create a Docker Compose File for Kafka Cluster

This file will define the services required for your Kafka cluster. An example docker-compose.yml for a simple cluster might look like this:

yaml
1version: '3.7'
2
3services:
4  zookeeper:
5    image: wurstmeister/zookeeper
6    ports:
7      - "2181:2181"
8    networks:
9      - kafka-net
10
11  kafka:
12    image: wurstmeister/kafka
13    ports:
14      - "9092:9092"
15    environment:
16      KAFKA_CREATE_TOPICS: "Topic1:1:3,Topic2:1:1"
17      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
18      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INSIDE:PLAINTEXT,OUTSIDE:PLAINTEXT
19      KAFKA_ADVERTISED_LISTENERS: INSIDE://:9093,OUTSIDE://_{HOSTNAME_COMMAND}:9092
20      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INSIDE:PLAINTEXT,OUTSIDE:PLAINTEXT
21      KAFKA_INTER_BROKER_LISTENER_NAME: INSIDE
22    volumes:
23      - /var/run/docker.sock:/var/run/docker.sock
24    networks:
25      - kafka-net
26    deploy:
27      replicas: 3
28      restart_policy:
29        condition: on-failure
30
31networks:
32  kafka-net:
33    driver: overlay

Deploy the Stack

Deploy this configuration to Docker Swarm using:

 
docker stack deploy --compose-file docker-compose.yml kafka_stack

Validate the Cluster

Check the status of the deployed stack:

 
docker stack services kafka_stack

Understanding Kafka-Docker Configuration Parameters

The configuration in the compose file specifies the environment variables and options like port mappings. Below are a few key parameters explained:

  • KAFKA_CREATE_TOPICS: Automatically create topics on startup.
  • KAFKA_ZOOKEEPER_CONNECT: Address for Zookeeper connection.
  • KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: Maps listener names to security protocols.
  • KAFKA_ADVERTISED_LISTENERS: Listeners to publish to ZooKeeper for clients to use.

Monitoring and Scaling

Kafka on Docker Swarm is capable of being monitored with standard Docker monitoring tools. Scaling can be performed by changing the replicas under the deploy key for the Kafka service in the Docker compose file.

Summary Table

ComponentDescriptionConfiguration Example
ZookeeperHandles coordination for Kafka clusterzookeeper:2181
KafkaCore messaging serviceimage: wurstmeister/kafka
NetworksDocker overlay networks for communication across nodeskafka-net
VolumesMaps local storage to container storage for log retention/var/run/docker.sock:/var/run/docker.sock

This setup helps in achieving a reliable and scalable Kafka cluster on Docker Swarm, providing high throughput and low-latency capabilities for real-time data feed handling across distributed systems.


Course illustration
Course illustration

All Rights Reserved.