Over the past few years, Apache Kafka has emerged as the top event streaming platform for streaming data/event ingestion. However, in earlier version 3.5 of Apache Kafka, Zookeeper was the additional and mandatory component for managing and coordinating the Kafka cluster. Relying on ZooKeeper on the operational multi-node Kafka cluster introduced complexity and could be a single point of failure. Zookeeper is completely a separate system having its own configuration file syntax, management tools, and deployment patterns. In-depth skill with experience is necessary to manage and deploy two individual distributed systems and eventually up and running Kafka cluster. Having expertise in Kafka administration without Zookeeper won’t be able to help to come out from the crisis, especially in the production environment where Zookeeper runs in a completely isolated environment (Cloud).
Kafka’s reliance on ZooKeeper for metadata management was eliminated by introducing the Apache Kafka Raft (KRaft) consensus protocol. This eliminates the need for and configuration of two distinct systems—ZooKeeper and Kafka—and significantly simplifies Kafka’s architecture by transferring metadata management into Kafka itself. Apache Kafka has officially deprecated ZooKeeper in version 3.5 and the latest version of Kafka which is 3.8, improved the KRaft metadata version-related messages. There is no use unless we consume the ingested events from the Kafka topic and process them further to achieve business value.
RisingWave on the other hand makes processing streaming data easy, dependable, and efficient once event streaming flows to it from Kafka topic. Impressively RisingWave excels in delivering consistently updated materialized views, which are persistent data structures reflecting the outcomes of stream processing with incremental updates.
In this article, I am going to explain step by step how to install and configure the latest version of Apache Kafka, version 3.8 on a single-node cluster running on Ubuntu-22.04 and subsequently integrate it with RisingWave that was too installed and configured on the same node.
Assumptions:-
Installation and Configuration of Apache Kafka-3.8 with KRaft:-
process.roles=broker,controller
and subsequently node.id=1, num.partitions=5 and delete.topic.enable=true.
Start and Verify the cluster:-
Topic Creation:-
Using Apache Kafka’s built-in script kafka-topics.sh available inside the bin directory, I can create topic on the running Kafka broker using the terminal. Created one topic named “UPIStream” with a number of partitions 3. You can read here how to use Kafka’s built-in scripts.
Make RisingWave functional as a single instance on standalone mode:-
As said above, RisingWave in the standalone mode has been installed and configured on the same node where Kafka 3.8 on KRaft mode is operational. The RisingWave in standalone mode leverages the embedded SQLite database to store metadata and data in the file system. Before that, we need to install and configure the PostgreSQL client as mentioned in the assumptions.
$ curl https://risingwave.com/sh | sh
$./risingwave
$ psql -h 127.0.0.1 -p 4566 -d dev -U root
Connecting Kafka broker with RisingWave:-
Here I am going to connect RisingWave with the Kafka broker that I want to receive events from the created topic “UPIStream”. I need to create a source in RisingWave using the CREATE SOURCE command. When creating a source, I can choose to persist the data from the Kafka topic in RisingWave by using the CREATE TABLE command and specifying the connection settings and data format. There are more additional parameters available while connecting to Kafka broker. You can refer here (https://docs.risingwave.com/docs/current/ingest-from-kafka/) to know more.
Adding the following to simply connect the topic “UPIStream” on the psql terminal.
Continuous pushing of events from Kafka topic to RisingWave:-
Using a developed simulator in Java, I have published a stream of upi transaction events at an interval of 0.5 second in the following JSON format to the created topic “UPIStream”. Here is the one stream of event.
{“timestamp”:”2024-08-20 22:39:20.866″,”upiID”:”9902480505@pnb”,”name”:”Brahma Gupta Sr.”,”note”:” “,”amount”:”2779.00″,”currency”:”INR”,”Latitude”:”22.5348319″,”Longitude”:”15.1863628″,”deviceOS”:”iOS”,”targetApp”:”GPAY”,”merchantTransactionId”:”3619d3c01f5ad14f521b320100d46318b9″,”merchantUserId”:”11185368866533@sbi”}
Verify and analyzing events on RisingWave:-
Move to the psql terminal that is already connected with the RisingWave single instance and consuming all the published events from the Kafka topic “UPIStream” and storing on the source “UPI_Transaction_Stream”. On the other side Java simulator is running and continuously publishing individual event with different data to topic “UPIStream” at an interval of 0.5 second and subsequently each event is getting ingested to the RisingWave instance for further processing/analyzing.
After processing/modifying the events using the Materialized views, I could sink or send those events back to the different Kafka topic so that downstream applications can consumes those for further analytics. I’ll articulate this on my upcoming blog, please stay tuned J.
Since I have not done any processing, modification, or computations on the ingested events in the running RisingWave instance, I created a simple Materialized view to observe a few fields in the events to make sure integration with Apache Kafka on KRaft mode with RisingWave is working absolutely fine or not. And the answer is a big YES :).
Final Note:-
Especially for the on-premises deployment of a multi-node Kafka cluster, Apache Kafka 3.8 is an excellent release where we completely bypass the ZooKeeper dependency. Besides, it’s easy to set up a development environment for those who want to explore more about event streaming platforms like Apache Kafka. On the other hand, RisingWave functions as a streaming database that innovatively utilizes materialized views to power continuous analytics and data transformations for time-sensitive applications like alerting, monitoring, and trading. Ultimately, it’s becoming a game-changer as Apache Kafka joins forces with RisingWave to unlock business value from real-time stream processing.
Ref:-
I hope you enjoyed reading this. If you found this article valuable, please consider liking and sharing it.
Can be reached for real-time POC development and hands-on technical training at gautambangalore@gmail.com. Besides, to design, develop just as help in any Hadoop/Big Data handling related task, Apache Kafka, Streaming Data etc. Gautam is a advisor and furthermore an Educator as well. Before that, he filled in as Sr. Technical Architect in different technologies and business space across numerous nations.
He is energetic about sharing information through blogs, preparing workshops on different Big Data related innovations, systems and related technologies.
Page: 1 2
A data fabric is an innovative system designed to seamlessly integrate and organize data from… Read More
Big data technologies' quick development has brought attention to the necessity of a smooth transition… Read More
Data is being generated from various sources, including electronic devices, machines, and social media, across… Read More
An Apache Kafka outage occurs when a Kafka cluster or some of its components fail,… Read More
Complex event processing (CEP) is a highly effective and optimized mechanism that combines several sources… Read More
Source:- www.PacktPub.com This book focuses on data science, a rapidly expanding field of study and… Read More