*The author selected Apache Software Foundation to receive a donation as part of the Write for DOnations program.*

Introduction

Apache Kafka is an open-source distributed event and stream-processing platform written in Java, built to process demanding real-time data feeds. It is designed to be fault-tolerant with support for hundreds of nodes per cluster. Running a greater number of nodes efficiently requires containerization and orchestration processes for optimal resource usage, such as Kubernetes.

In this tutorial, you'll learn how to deploy Kafka using Docker Compose. You'll also learn how to deploy it on Kubernetes using Strimzi, which integrates into Kubernetes and allows you to configure and maintain Kafka clusters using regular Kubernetes manifests without manual overhead.

Prerequisites

To follow this tutorial, you will need:

Docker installed on your machine. For Ubuntu, visit How To Install and Use Docker on Ubuntu. You only need to complete Step 1 and Step 2.
Docker Compose installed on your machine. For Ubuntu, visit How To Install and Use Docker Compose on Ubuntu. You only need to complete Step 1 and Step 2.
A Kubernetes v1.23+ cluster with your connection configured as the kubectl default. Instructions on how to configure kubectl are shown under the Connect to your Cluster step when you create your cluster. To create a Kubernetes cluster on the cloud provider, read the Kubernetes Quickstart.
The Helm 3 package manager installed on your local machine. Complete Step 1 of the How To Install Software on Kubernetes Clusters with the Helm 3 Package Manager tutorial.
An understanding of Kafka, including topics, producers, and consumers. For more information, please visit Introduction to Kafka.

Step 1 - Running Kafka Using Docker Compose

In this section, you'll learn how to run Kafka using Docker Compose in KRaft mode. Utilizing KRaft streamlines the overall configuration and resource usage as no ZooKeeper instances are required.

First, you'll define a Docker image that contains an unpacked Kafka release. You'll use it to test the connection to the Kafka container by using the included scripts.

You'll store the necessary commands in a Dockerfile. Create and open it for editing:

				
					nano Dockerfile

Add the following lines:

				
					[label Dockerfile]
FROM ubuntu:latest AS build
RUN apt-get update
RUN apt-get install curl default-jre -y
WORKDIR /kafka-test
RUN curl -o kafka.tgz &lt;^&gt;https://dlcdn.apache.org/kafka/3.7.0/kafka_2.13-3.7.0.tgz&lt;^&gt;
RUN tar -xzf kafka.tgz --strip-components=1

The container is based on the latest Ubuntu version. After updating the package cache and installing curl and Java, you download a Kafka release package. At the time of writing, the latest version of Kafka was 3.7.0. You can look up the latest version on the official Downloads page and replace the highlighted value if required.

Then, you set the WORKDIR (working directory) to /kafka-test, to which you download and extract the Kafka release. The --strip-components=1 parameter is passed into tar to skip the first directory of the archive, which is named after the archive itself.

Save and close the file.

Next, you'll define the Docker Compose configuration in a file named kafka-compose.yaml. Create and open it for editing by running:

				
					nano kafka-compose.yaml

Add the following lines:

				
					[label kafka-compose.yaml]
version: '3'

services:
  kafka:
    image: 'bitnami/kafka:latest'
    environment:
      - KAFKA_CFG_NODE_ID=0
      - KAFKA_CFG_PROCESS_ROLES=controller,broker
      - KAFKA_CFG_LISTENERS=PLAINTEXT://:9092,CONTROLLER://:9093
      - KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
      - KAFKA_CFG_CONTROLLER_QUORUM_VOTERS=0@kafka:9093
      - KAFKA_CFG_CONTROLLER_LISTENER_NAMES=CONTROLLER
  kafka-test:
    build:
      dockerfile: Dockerfile
      context: .
    tty: true

Here you define two services, kafka and kafka-test. The kafka service is based on the latest Bitnami Kafka image. Under the environment section, you pass in the necessary environment variables and their values, which configure the Kafka node to be standalone with an ID of 0.

For kafka-test, you pass in the Dockerfile you've just created as the base for building the image of the container. By setting tty to true, you leave a session open with the container. This is necessary to keep it alive, as it would otherwise exit immediately after startup with no further commands.

Save and close the file, then run the following command to bring up the services in the background:

				
					docker-compose -f kafka-compose.yaml up -d

The output will be long because kafka-test will be built for the first time. The end of the output will be:

				
					[secondary_label Output]
...
Creating docker_kafka_1      ... done
Creating docker_kafka-test_1 ... done

You can list the running containers with:

				
					docker ps

The output will look like the following:

				
					[secondary_label Output]
CONTAINER ID   IMAGE                  COMMAND                  CREATED         STATUS         PORTS      NAMES
3ce3e3190f6e   bitnami/kafka:latest   "/opt/bitnami/script…"   4 seconds ago   Up 3 seconds   9092/tcp   docker_kafka_1
2a0cd13859e3   docker_kafka-test      "/bin/bash"              4 seconds ago   Up 3 seconds              &lt;^&gt;docker_kafka-test_1&lt;^&gt;

Open a shell in the kafka-test container by running:

				
					docker exec -it &lt;^&gt;docker_kafka-test_1&lt;^&gt; bash

The shell will already be positioned in the /kafka-test directory:

				
					root@2a0cd13859e3:/kafka-test#

Then, try creating a topic using kafka-topics.sh:

				
					bin/kafka-topics.sh --create --topic first-topic --bootstrap-server &lt;^&gt;kafka&lt;^&gt;:9092

Note that you refer to Kafka by its name in the Docker Compose configuration (kafka).

The output will be:

				
					[secondary_label Output]
Created topic first-topic.

You've successfully connected to the Kafka deployment from within the Docker Compose service. Type in exit and press Enter to close the shell.

To stop the Docker Compose deployment, run the following command:

				
					docker-compose -f kafka-compose.yaml down

In this step, you've deployed Kafka using Docker Compose. You've also tested that Kafka is available from within other containers by deploying a custom image that contains shell scripts for connecting to it. In the rest of the tutorial, you'll learn how to deploy Kafka on Kubernetes.

Step 2 - Installing Strimzi to Kubernetes

In this section, you'll install Strimzi to your Kubernetes cluster. This entails adding its repository to Helm and creating a Helm release.

You'll first need to add the Strimzi Helm repository to Helm, which contains the Strimzi chart:

				
					helm repo add strimzi https://strimzi.io/charts

The output will be:

				
					[secondary_label Output]
"strimzi" has been added to your repositories

Then, refresh Helm’s cache to download its contents:

				
					helm repo update

You'll see the following output:

				
					[secondary_label Output]
...Successfully got an update from the "strimzi" chart repository
Update Complete. ⎈Happy Helming!⎈

Finally, install Strimzi to your cluster by running:

				
					helm install strimzi strimzi/strimzi-kafka-operator

The output will look like this:

				
					[secondary_label Output]
NAME: strimzi
LAST DEPLOYED: ...
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Thank you for installing strimzi-kafka-operator-0.40.0

To create a Kafka cluster refer to the following documentation.

https://strimzi.io/docs/operators/latest/deploying.html#deploying-cluster-operator-helm-chart-str

You now have Strimzi installed in your Kubernetes cluster. In the next section, you'll use it to deploy Kafka to your cluster.

Step 3 - Deploying a Kafka Cluster to Kubernetes

In this section, you'll deploy a one-node Kafka cluster with ZooKeeper to your Kubernetes cluster. At the time of writing, support for deploying Kafka using KRaft was not generally available in Strimzi.

You'll store the Kubernetes manifest for the deployment in a file named kafka.yaml. Create and open it for editing:

				
					nano kafka.yaml

Add the following lines to your file:

				
					[label kafka.yaml]
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: my-cluster
spec:
  kafka:
    version: 3.7.0
    replicas: 1
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true
    config:
      offsets.topic.replication.factor: 1
      transaction.state.log.replication.factor: 1
      transaction.state.log.min.isr: 1
      default.replication.factor: 1
      min.insync.replicas: 1
      inter.broker.protocol.version: "3.7"
    storage:
      type: jbod
      volumes:
      - id: 0
        type: persistent-claim
        size: 100Gi
        deleteClaim: false
  zookeeper:
    replicas: 1
    storage:
      type: persistent-claim
      size: 100Gi
      deleteClaim: false
  entityOperator:
    topicOperator: {}
    userOperator: {}

The first block of the spec is related to Kafka itself. You set the version, as well as the number of replicas. Then, you define two _listeners_, which are ports that the Kafka deployment will use to communicate. The second listener is encrypted because you set tls to true. Since listeners can't collide, you assign 9093 as the port number for the second one.

Since you're deploying only one Kafka node in the config section, you set various replication factors (for the topics, events, and replicas) to 1. For storage, you set the type to jbod (meaning "just a bunch of disks") which allows you to specify multiple volumes. Here, you define one volume of type persistent-claim with a size of 100GB. This will create a the cloud provider Volume and assign it to Kafka. You also set deleteClaim to false to ensure that data isn't deleted when the Kafka cluster is destroyed.

To configure the zookeeper deployment, you set its number of replicas to 1 and provide it with a single persistent-claim of 100GB, as only Kafka supports the jbod storage type. The two definitions under entityOperator instruct Strimzi to create cluster-wide operators for handling Kafka topics and users.

Save and close the file, then apply it by running:

				
					kubectl apply -f kafka.yaml

kubectl will display the following output:

				
					[secondary_label Output]
kafka.kafka.strimzi.io/my-cluster created

You can watch the deployment become available by running:

				
					kubectl get strimzipodset -w

After a few minutes, both Kafka and Zookeeper pods will become available and ready:

				
					[secondary_label Output]
NAME                   PODS   READY PODS   CURRENT PODS   AGE
...
my-cluster-kafka       1      &lt;^&gt;1&lt;^&gt;            1              28s
my-cluster-zookeeper   1      &lt;^&gt;1&lt;^&gt;            1              61s

To list Kafka deployments, run the following command:

				
					kubectl get kafka

You'll see output similar to this:

				
					[secondary_label Output]
NAME         DESIRED KAFKA REPLICAS   DESIRED ZK REPLICAS   READY   METADATA STATE   WARNINGS
my-cluster   1                        1

Now that Kafka is running, you'll create a topic in it. Open a file called kafka-topic.yaml for editing:

				
					nano kafka-topic.yaml

Add the following lines:

				
					[label kafka-topic.yaml]
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
  name: my-topic
  labels:
    strimzi.io/cluster: "my-cluster"
spec:
  partitions: 1
  replicas: 1

This KafkaTopic defines a topic called my-topic in the cluster you've just deployed (my-cluster).

Save and close the file, then apply it by running:

				
					kubectl apply -f kafka-topic.yaml

The output will be:

				
					[secondary_label Output]
kafkatopic.kafka.strimzi.io/my-topic created

Then, list all Kafka topics in the cluster:

				
					kubectl get kafkatopic

kubectl will show the following output:

				
					[secondary_label Output]
NAME       CLUSTER      PARTITIONS   REPLICATION FACTOR   READY
my-topic   my-cluster   1            1                    True

In this step, you've deployed Kafka to your Kubernetes cluster using Strimzi, which takes care of the actual resources and ZooKeeper instances. You've also created a topic, which you'll use in the next step when connecting to Kafka.

Step 4 - Connecting to Kafka in Kubernetes

In this section, you'll learn how to connect to a Kafka cluster deployed on Kubernetes from within the cluster.

Thanks to Strimzi, your Kafka deployment is already available to pods in the cluster. Any app from within can connect to the <^>my-cluster<^>-kafka-bootstrap endpoint, which will automatically be resolved to the my-cluster cluster.

You'll now deploy a temporary pod to Kubernetes based on a Docker image that Strimzi provides. The image contains a Kafka installation with shell scripts for producing and consuming textual messages (kafka-console-producer.sh and kafka-console-consumer.sh).

Run the following command to run the producer script in-cluster:

				
					kubectl run kafka-producer -ti \
--image=quay.io/strimzi/kafka:0.40.0-kafka-3.7.0 --rm=true --restart=Never \
-- bin/kafka-console-producer.sh --bootstrap-server &lt;^&gt;my-cluster-kafka-bootstrap:9092&lt;^&gt; --topic &lt;^&gt;my-topic&lt;^&gt;

The temporary pod will be called kafka-producer and will use the image from the Strimzi project. It will be deleted after the commands end executing (--rm=true) and will never be restarted, as it's a one-time job. Then, you pass in the command to run kafka-console-producer.sh script. As noted previously, you pass in the my-cluster-kafka-bootstrap designator for the server and my-topic as the topic name.

The output will look like this:

				
					[secondary_label Output]
If you don't see a command prompt, try pressing enter.
&gt;

You can input any text message and press Enter to send it to the topic:

				
					[secondary_label Output]
If you don't see a command prompt, try pressing enter.
&gt;Hello World!
&gt;

To exit, press CTRL+C and confirm with Enter. Then, run the following command to run the consumer script in-cluster:

				
					kubectl run kafka-consumer -ti \
--image=quay.io/strimzi/kafka:0.40.0-kafka-3.7.0 --rm=true --restart=Never \
-- bin/kafka-console-consumer.sh --bootstrap-server &lt;^&gt;my-cluster-kafka-bootstrap:9092&lt;^&gt; --topic &lt;^&gt;my-topic&lt;^&gt; --from-beginning

You may need to press Enter for the command to proceed. The output will be similar to this:

				
					[secondary_label Output]
Hello World!
...

You've learned how to connect to your Kafka deployment from within the cluster. You'll now expose Kafka to the outside world.

Step 5 - Exposing Kafka Outside of Kubernetes

In this step, you'll expose your Kafka deployment externally using a load balancer.

Strimzi has a built-in way of creating and configuring a load balancer for Kafka. Open kafka.yaml for editing by running:

				
					nano kafka.yaml

Add the following lines to the listeners section:

				
					[label kafka.yaml]
...
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true
      &lt;^&gt;- name: external&lt;^&gt;
        &lt;^&gt;port: 9094&lt;^&gt;
        &lt;^&gt;type: loadbalancer&lt;^&gt;
        &lt;^&gt;tls: false&lt;^&gt;
...

The highlighted part defines a new listener with the type loadbalancer that will accept connections at port 9094 without TLS encryption.

Save and close the file, then apply the new manifest by running:

				
					kubectl apply -f kafka.yaml

The output will be:

				
					[secondary_label Output]
kafka.kafka.strimzi.io/my-cluster configured

Run the following command to watch it become available:

				
					kubectl get service my-cluster-kafka-external-bootstrap -w -o=jsonpath='{.status.loadBalancer.ingress[0].ip}{"\n"}'

When the load balancer that fronts traffic for Kafka becomes available, the output will be its IP address.

As part of the prerequisites, you downloaded and extracted the latest Kafka release to your machine. Navigate to that directory and run the console consumer, replacing <^>your_lb_ip<^> with the IP address from the output of the previous command:

				
					bin/kafka-console-consumer.sh --bootstrap-server &lt;^&gt;your_lb_ip&lt;^&gt;:9094 --topic my-topic --from-beginning

You'll soon see the messages being read from the topic, meaning that you've been successfully connected:

				
					[secondary_label Output]
Hello World!
...

To delete all Strimzi-related resources from your cluster (such as Kafka deployments and topics), run the following command:

				
					kubectl delete $(kubectl get strimzi -o name)

Conclusion

In this article, you've deployed Kafka using Docker Compose and verified that you can connect to it. You've also learned how to install Strimzi to your Kubernetes cluster and deployed a Kafka cluster using the provided manifests.