Aside from holding the publish-subscribe messaging system functionality, Apache Kafka is gaining outstanding momentum as a distributed event streaming platform. It is being leveraged for high-performance data pipelines, streaming analytics, data integration, etc. Additionally, standing as a backbone for IoT data platform to handle massive amounts of heterogeneous data ingestion.
Despite surfacing tremendous usefulness related to data transportation, numerous Kafka connectors for import/export of data from various data systems, still Kafka client library is an obstacle for any programming language other than Java to serve as producer/consumer of messages for Kafka.
Confluent’s Kafka REST Proxy and its importance
Apache Kafka offers its functionality through a well-defined set of APIs. With the help of the client library, we can interact or communicate to those Kafka’s APIs but the client library is limited to Java only. Even though Kafka provides a bunch of CLI tools, they are just thin wrappers over the Java client library. Due to this limitation, many programming languages lack officially supported, production-grade client libraries for Kafka. The Confluent platform that uses Apache Kafka as their central nervous system, officially supports client libraries for C/C++, C#, Go, and Python. To overcome the above bottleneck, Confluent has developed a REST Proxy module to expose Apache Kafka’s APIs via HTTP. Since HTTP is a widely available and universally supported protocol, any programming language without banking on Kafka’s client library can interact with the Apache Kafka APIs. The Kafka REST proxy works as a RESTful web API. Any application developed irrespective of any programming language and having the capability to send and receive HTTP/HTTPS messages, can be configured to operate with Kafka cluster. The Confluent REST Proxy is a component or sub-module of the entire Confluent platform .
To use the Confluent REST Proxy module independently with operational multi-node multi-broker Apache Kafka cluster, we can either download the source code from GitHub repository and build or can download the prebuilt versions which is a part of the Confluent Platform.
As a side note, the Confluent REST Proxy project is licensed under the Confluent Community License. The REST integration for Apache Kafka is not part of the Open Source Apache Kafka ecosystem.
In this article, I am going to detailing out the steps to integrate the prebuilt versions of Confluent REST Proxy with running a multi-broker Apache Kafka cluster with subsequent, publish and consume simple messages from a terminal/CLI without using any built-in scripts. To enhance better readability, the article has been segmented into four parts.
Assumptions
Considering three nodes in the cluster and each one of them already installed and running with Apache Kafka of version 2.6.0. Similarly Zookeeper server (V 3.5.6) as a separate instance on top of OS Ubuntu 14.04 LTS. All the nodes in the entire cluster had configured with Java version “1.8.0_101”. Besides, the Kafka cluster is configured with three brokers and two topics. Each topic has three partitions. There is no need of changing any existing configuration w.r.t topics and partitions for clubbing the REST Proxy module.
Confluent REST Proxy can be installed and run on a separate node outside of the Kafka cluster. Due to hardware limitations, I have not scaled the existing cluster horizontally with one more additional node for REST Proxy. Instead, selected a healthy node in the existing Kafka cluster that having 16GB RAM and 1 TB HD for REST Proxy to run.
Download and Install Confluent platform
Even though the Confluent platform accommodates Apache Kafka, Zookeeper, KSqlDB, REST Proxy, Schema Registry, etc together as a suite, here only the REST Proxy will be integrating or hooking up with independently running Kafka cluster. In short, this procedure eliminates the data migration or shifting of stored messages that existed on topics from the operational Kafka cluster to the Confluent platform.
Downloaded prebuilt version confluent-community-5.5.0-2.12.tar from here under Confluent Community License. This procedure is not recommended for commercial/ production use without a valid license from Confluent.
Configuration and verify/run Kafka REST
As mentioned above, copy and extract/untar the confluent-community-5.5.0-2.12.tar with root privilege under /usr/local/
Navigated to /usr/local/confluent-5.5.0/etc/kafka-rest and modified the kafka-rest.properties file to update the values of following keys
# id
To define unique id of the Confluent REST proxy server instance. Ideally, it is used in generating unique IDs for consumers that do not specify their ID.
#schema.registry.url
The URL of the Schema Registry can be installed and runs on a separate node outside the Apache Kafka cluster. The Schema Registry holds versioned history of all schemas used by the serializer like Avro, JSON, Protobuf when producers submit messages with complex data types and subsequently consumers for decoding the consumed messages. You can go through this link to know more about the importance as well as configuration of Schema Registry with Apache Kafka cluster.
In this integration, not provided the URL of Schema Registry because decided not to publish any messages with complex data types. Instead, will use basic data types like String, int, etc while publishing messages to topic using terminal or REST client browser plug-in. But it is highly recommended to use Schema Registry URL in the production environment to allows the evolution of schemas
#zookeeper.connect
The list of Zookeeper server with comma-separated value
#bootstrap.servers
The list of Kafka brokers to connect that runs on multi-node cluster separated by comma (,). For single node Kafka cluster, it should be
bootstrap.server=http://localhost:9091
Here is our kafka-rest.properties
Since we are not using Confluent Control Center, so rest of the keys should be kept commented only.
To run the Kafka REST proxy, navigate to the bin directory under confluent-5.5.0 and execute the script “kafka-rest-start” with the location of the kafka-rest.properties as a parameter.
:/usr/local/confluent/bin$ ./kafka-rest-start ../etc/kafka-rest/kafka-rest.properties
and eventually, Kafka REST proxy will start with the following messages in the same console/terminal.
To make sure Kafka REST proxy is up and running, open a new terminal, type the following command and check the list of available topics that already created on the cluster.
~$ curl -X GET <<IP address of the node where REST proxy is running:8082>>/topics
Producing and Consuming messages via terminal/CLI
In this section, I have demonstrated how easily we can publish and consume messages from the topic without writing a single line of java and not by executing built-in scripts like kafka-console-producer.sh and kafka-console-consumer.sh.
– Publishing a simple message in JSON data format to a specific topic
Typed the following command from the terminal as request and subsequently, got the following response with no error
curl -X POST http://<IP Address of node where Kafka REST proxy is running>:8082/topics/<<name of the topic>> –data ‘{“records”: [{“key”: “firstMSG”,”value”: “Sending Msg from through REST Proxy”}]}’ –header “Content-Type: application/vnd.kafka.json.v2+json”
– Consume the produced message
i. To consume the above published message, we need to first create a consumer instance in the consumer group. Named the consumer group and consumer instance as ‘dataviewConsumer ’ and ‘dataview_consumer’
Request:
curl -X POST http://192.168.10.110:8082/consumers/ dataviewConsumer –data ‘{“name”: “dataview_consumer”, “format”: “json”, “auto.offset.reset”: “earliest”}’ –header “Content-Type: application/vnd.kafka.json.v2+json”
Response :
{“instance_id”:”dataview_consumer”, “base_uri”: http://192.168.10.110:8082/consumers/dataviewConsumer/instances/dataview_consumer”}
ii. Subscribe consumer instance to the topic “Auguest”
curl -X POST http://192.168.10.110:8082/consumers/dataviewConsumer/instances/dataview_consumer/subscription –data ‘{“topics”: [“Auguest”]}’ –header “Content-Type: application/vnd.kafka.json.v2+json”
iii. And finally consumed the produced message. Consumer instances are removed if idle for 5 minutes so publish the message and consume subsequently from the topic to verify.
Request
curl -X GET http://192.168.10.110:8082/consumers/dataviewConsumer/instances/dataview_consumer/records –header “Accept: application/vnd.kafka.json.v2+json”
Response:-
[{“topic”:”Auguest”, “key”:”firstMSG” , “value”:”Sending Msg from through REST Proxy “, partition”:0,”offset”:1}]
The Crux
To avoid response as 404 Not Found while sending each request, we should be watchful of setting the content type headers. The content type headers should be same when setting up and configuring consumer instances, producing Kafka topic.
Final Note:
The Confluent Inc. has developed this excellent component as part of their complete event streaming platform which can be utilized for
At last, Confluent’s Kafka REST proxy is a utmost valuable, when there is a limitation to utilize the Kafka client library straightforwardly
Hope you have enjoyed this read. Please like and share if you feel this composed is valuable.
Ref:-
Can be reached for real-time POC development and hands-on technical training at gautambangalore@gmail.com. Besides, to design, develop just as help in any Hadoop/Big Data handling related task, Apache Kafka, Streaming Data etc. Gautam is a advisor and furthermore an Educator as well. Before that, he filled in as Sr. Technical Architect in different technologies and business space across numerous nations. He is energetic about sharing information through blogs, preparing workshops on different Big Data related innovations, systems and related technologies.
Page: 1 2
Complex event processing (CEP) is a highly effective and optimized mechanism that combines several sources… Read More
Source:- www.PacktPub.com This book focuses on data science, a rapidly expanding field of study and… Read More
Over the past few years, Apache Kafka has emerged as the top event streaming platform… Read More
In the current fast-paced digital age, many data sources generate an unending flow of information,… Read More
At first, data tiering was a tactic used by storage systems to reduce data storage… Read More
With the use of telemetry, data can be remotely measured and transmitted from multiple sources… Read More