• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
Clean Programmer

Clean Programmer

Programming & DevOps Resources

  • Home
  • About
  • Contact

How to setup Apache Kafka Cluster

July 7, 2018 Monzurul Haque Shimul

Apache Kafka® is a distributed streaming platform. It is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies.

Kafka has three key capabilities:

  • Publish and subscribe to streams of records, similar to a message queue or enterprise messaging system.
  • Store streams of records in a fault-tolerant durable way.
  • Process streams of records as they occur.

Kafka is generally used for two broad classes of applications:

  • Building real-time streaming data pipelines that reliably get data between systems or applications
  • Building real-time streaming applications that transform or react to the streams of data

In this post, I will discuss about setting up a 3 nodes Kafka cluster.

Download Distribution

Download the 1.1.0 release and un-tar it.

$ tar -xzf kafka_2.11-1.1.0.tgz
$ cd kafka_2.11-1.1.0

Configuration

Pre-requisite

Both Kafka & ZooKeeper runs in Java (JDK 8 or greater). I’m using java 8. Have your preferred java version installed and set your java home correctly. You can do this easily using SDKMAN! Read my post here about SDKMAN to learn how to smartly manage SDKs using SDKMAN!

Zookeeper

Kafka uses ZooKeeper so you need to first start a ZooKeeper server if you don’t already have one. I have already discussed how to setup zookeeper cluster in my previous post. You can follow that to set up your own Zookeeper cluster. Or you can use the one that comes with Apache Kafka distribution. I will discuss here about setting up Zookeeper cluster from Kafka distribution.

Set hostname zk01, zk02, zk03 for 3 nodes. On each node, set an environment variable ZK_HOME where you have extracted the kafka distribution.

create a file zk-cluster.sh with following:
#!/bin/sh

MY_ID=$1

mkdir -p $ZK_HOME/var/zk/data
mkdir -p $ZK_HOME/var/zk/log
mkdir -p $ZK_HOME/conf

cat >$ZK_HOME/conf/zk.properties << EOF
dataDir=$ZK_HOME/var/zk/data
clientPort=2181
maxClientCnxns=0

server.1=zk01:2888:3888
server.2=zk02:2888:3888
server.3=zk03:2888:3888

initLimit=5
syncLimit=2

autopurge.snapRetainCount=3
autopurge.purgeInterval=1
EOF

cat >$ZK_HOME/var/zk/data/myid << EOF
${MY_ID}
EOF

On node zk01, run:

./zk-cluster.sh 1

On node zk02, run:

./zk-cluster.sh 2

On node zk03, run:

./zk-cluster.sh 3

Now start zookeeper on each node by running:

$ screen -S zk
$ cd $ZK_HOME
$ ./bin/zookeeper-server-start.sh conf/zk.properties

Kafka

Set hostname kafka01, kafka02, kafka03 for 3 nodes. On each node, set an environment variable KAFKA_HOME with path where you have extracted the kafka distribution.

Create a file named kafka-cluster.sh with following content:

create kafka-cluster with following:
#!/bin/sh

BROKER_ID=$1
HOST=$2

mkdir -p $KAFKA_HOME/var/kafka/data
mkdir -p $KAFKA_HOME/var/kafka/log
mkdir -p $KAFKA_HOME/conf

cat >$KAFKA_HOME/conf/kafka.properties << EOF
broker.id=${BROKER_ID}

listeners=PLAINTEXT://${HOST}:9092
advertised.listeners=PLAINTEXT://${HOST}:9092
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600

log.dirs=$KAFKA_HOME/var/kafka/log
num.partitions=1
num.recovery.threads.per.data.dir=1

log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000

zookeeper.connect=zk01:2181,zk02:2181,zk03:2181
zookeeper.connection.timeout.ms=6000
EOF

On node kafka01, run:

./kafka-cluster.sh 1 kafka01

On node kafka02, run:

./kafka-cluster.sh 2 kafka02

On node kafka03, run:

./kafka-cluster.sh 3 kafka03

Now start kafka on each node by running:

$ screen -S kafka
$ cd $KAFKA_HOME
$ ./bin/kafka-server-start.sh conf/kafka.properties

Congratulations! You are done with setting up your own 3 node Kafka cluster.

Conclusion

Kafka is the most popular streaming platform and it is being used widely among organizations. In this post, I have discussed about setting up a Kafka cluster. Thanks for reading and feel free to comment or share the article.

 

Related Posts

How To Compact Druid Data Segments Using Compaction Task
How to Reindex Data in Druid with Native Batch Ingestion
Loading Data From Apache Kafka to Druid

Categories: Apache Kafka Tags: kafka, zookeeper

Primary Sidebar

Categories

  • Apache Kafka
  • Druid
  • Git
  • Java
  • Java EE
  • Redis
  • Spring
  • Uncategorized
  • Weblogic
  • Wildfly

Featured Posts

Deploy applications to WildFly server using wildfly maven plugin

Mapping Between Domain & Data Transfer Objects With MapStruct

Is Your Programming Team Your Rock Band?

Automate the build process for project release using bash script

How to Configure Druid to Use Microsoft SQL Server as Metadata Storage

How to Reindex Data in Druid with Native Batch Ingestion

Footer

Monzurul Haque Shimul

I’m a full-stack software engineer with 10 years of experience in design and development of large scaled Enterprise Software Systems built on Java and Java EE related tools and technologies. I’m also a contributor on GitHub, Stack Overflow, DZone. My core expertise lies in building JVM-based, scalable, reactive, data-driven applications.

Follow

  • Email
  • GitHub
  • LinkedIn
  • Twitter

© 2019 CLEAN PROGRAMMER

  • Home
  • Archive
  • About
  • Contact
  • Privacy Policy
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.OKNoRead more
Revoke Cookies