In this tutorial, we're installing Apache Kafka on AlmaLinux 9. It is a distributed streaming platform that is widely used for building real-time data pipeline.
Apache Kafka is designed for high-throughput, low-latency data streaming and is widely used for building real-time data pipelines and streaming applications. Kafka enables you to publish, subscribe to, store, and process streams of records in a distributed and fault-tolerant manner, making it a popular choice for organizations dealing with large-scale, real-time data feeds.
Prerequisites
Before starting, ensure you have the following:
- An AlmaLinux 9 dedicated server with a non-root user with sudo privileges.
- Java Development Kit (JDK) installed on your server.
- At least 2GB of RAM.
Step 1: Update the System
Start by updating the package list and upgrading the system packages to the latest versions.
sudo dnf update -y
Step 2: Install Java
Kafka requires Java to run. Install the latest version of OpenJDK available in the AlmaLinux repositories.
sudo dnf install java-21-openjdk-devel -y
(Optional) Change current Java version
If you have already installed different version of Java, you can change it using following command:
update-alternatives --config java
Select which Java version you want to set by entering number and hit enter.
Verify the installation:
java -version
You should see an output similar to:
openjdk version "21.0.3" 2024-04-16 LTS
OpenJDK Runtime Environment (Red_Hat-21.0.3.0.9-1) (build 21.0.3+9-LTS)
OpenJDK 64-Bit Server VM (Red_Hat-21.0.3.0.9-1) (build 21.0.3+9-LTS, mixed mode, sharing)
Step 3: Create Kafka User
For security reasons, it’s a good practice to create a dedicated user for Kafka.
sudo useradd -m -s /bin/bash kafka
sudo passwd kafka
Switch to the Kafka user:
sudo su - kafka
Step 4: Download and Extract Kafka
Download the latest stable version of Kafka from the official Apache Kafka download page.
wget https://downloads.apache.org/kafka/3.7.0/kafka_2.13-3.7.0.tgz
tar -xzf kafka_2.13-3.7.0.tgz
mv kafka_2.13-3.7.0 kafka
Step 5: Configure Kafka
Kafka requires Zookeeper, which comes bundled with Kafka for development and testing purposes. In a production environment, you should set up a dedicated Zookeeper cluster.
Configure Zookeeper
Create a data directory for Zookeeper:
mkdir -p ~/kafka/data/zookeeper
Edit the Zookeeper configuration file:
vim ~/kafka/config/zookeeper.properties
Update the dataDir property to point to the new data directory:
dataDir=/home/kafka/kafka/data/zookeeper
Configure Kafka Broker
Create a data directory for Kafka:
mkdir -p ~/kafka/data/kafka
Edit the Kafka configuration file:
vim ~/kafka/config/server.properties
Update the following properties:
log.dirs=/home/kafka/kafka/data/kafka
zookeeper.connect=localhost:2181
Step 6: Start Zookeeper and Kafka
Open two terminal sessions: one for Zookeeper and another for Kafka. Ensure you are logged in as the Kafka user in both.
Start Zookeeper
~/kafka/bin/zookeeper-server-start.sh ~/kafka/config/zookeeper.properties
Start Kafka
In the second terminal session, start Kafka:
~/kafka/bin/kafka-server-start.sh ~/kafka/config/server.properties
Step 7: Testing the Installation
Create a Topic
In a new terminal session, still logged in as the Kafka user, create a test topic:
~/kafka/bin/kafka-topics.sh --create --topic test --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1
List Topics
Verify the topic was created:
~/kafka/bin/kafka-topics.sh --list --bootstrap-server localhost:9092
Produce Messages
Start a Kafka producer:
~/kafka/bin/kafka-console-producer.sh --topic test --bootstrap-server localhost:9092
Type a few messages and hit Enter after each:
Hello Kafka
This is a test message
Consume Messages
Open another terminal session and start a Kafka consumer:
~/kafka/bin/kafka-console-consumer.sh --topic test --from-beginning --bootstrap-server localhost:9092
You should see the messages you typed in the producer terminal.
(Optional) Set SELinux Enforcing
If you have enabled SELinux
follow this step. In order start Kafka and Zookeeper service, we need to set SELinux to Enforcing. We haven't found SELinux configuration for those service. So, we're setting it to enforcing. Otherwise we will face Permission Denied error.
sudo setenforce 0
Step 8: Setting Up Kafka as a Systemd Service
To ensure Kafka
and Zookeeper
start on boot, you can set them up as systemd services.
Create a new systemd service file for Zookeeper:
sudo vim /etc/systemd/system/zookeeper.service
Add the following content:
[Unit]
Description=Apache Zookeeper server
Documentation=http://zookeeper.apache.org
After=network.target
[Service]
Type=simple
User=kafka
ExecStart=/home/kafka/kafka/bin/zookeeper-server-start.sh /home/kafka/kafka/config/zookeeper.properties
ExecStop=/home/kafka/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal
[Install]
WantedBy=multi-user.target
Create Kafka Systemd Service
Create a new systemd service file for Kafka:
sudo vim /etc/systemd/system/kafka.service
Add the following content:
[Unit]
Description=Apache Kafka server
Documentation=http://kafka.apache.org/documentation.html
After=network.target zookeeper.service
[Service]
Type=simple
User=kafka
ExecStart=/home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/server.properties
ExecStop=/home/kafka/kafka/bin/kafka-server-stop.sh
Restart=on-abnormal
[Install]
WantedBy=multi-user.target
Start and Enable the Services
Reload systemd to apply the new service files:
sudo systemctl daemon-reload
Start and enable Zookeeper:
sudo systemctl start zookeeper
sudo systemctl enable zookeeper
Start and enable Kafka:
sudo systemctl start kafka
sudo systemctl enable kafka
Conclusion
You have now successfully seen how to install Apache Kafka on AlmaLinux 9. You can create topics, produce and consume messages, and manage Kafka and Zookeeper as systemd services. This setup provides a robust foundation for building real-time data pipelines and streaming applications.
For further configuration and tuning, refer to the official Kafka documentation.