This article aims to explain the steps to coupling Confluent Schema Registry with existed/operational multi-broker Apache Kafka cluster(Local deployment). The Confluent is an integrated platform bundle with Apache Kafka and multiple different components starting from ksqlDB for stream processing, numerous connectors (Database, File, AWS, Azure, Google, etc), Schema Registry, Control Center, etc. Please click here to know more about the Confluent Platform.
In short, Schema Registry preserves a versioned history of all schemas, provides multiple compatibility settings, allows the evolution of schemas, etc. It supports Avro, JSON Schema, and Protobuf schemas. Can read here about the importance of Schema Registry on Kafka Based Data Pipelines
NOTE: The Schema Registry integration for Kafka is not part of the Open Source Apache Kafka ecosystem. Can execute this locally by downloading the prebuilt versions of the schema registry as part of the Confluent Platform or by building a development version with Maven. The source code in GitHub is available at https://github.com/confluentinc/schema-registry under Confluent Community License.
Article Structure
This article has segmented into five parts:
Here I am considering four nodes in the cluster and each one is already installed and running Kafka of version 2.6.0 with Zookeeper (V 3.5.6) on top of OS Ubuntu 14.04 LTS and java version “1.8.0_101”. Besides, configured four brokers with two topics and each topic with three partitions.
Note:- Confluent Schema Registry can be installed and run outside of the Apache Kafka cluster. Due to hardware limitation to append another node for Schema Registry in the Kafka cluster, I have selected a healthy node in the existing Kafka cluster that having 16GB RAM and 1 TB HD for Schema Registry to run.
Here we will be integrating only Schema Registry available inside the Confluent platform with the existing/operational Apache Kafka cluster even though the Confluent platform accommodates Kafka, Zookeeper, KSqlDB, Schema Registry etc. Downloaded prebuilt version confluent-community-5.5.0-2.12.tar from here under Confluent Community License(https://www.confluent.io/confluent-community-license-faq/). This procedure is not recommended for commercial/ production use without a valid license from Confluent. You can read here in detail about Confluent Licenses. Besides, can visit https://github.com/confluentinc/schema-registry for the source and build subsequently.
As mentioned in assumption, copy and extract/untar the confluent-community-5.5.0-2.12.tar with root privilege under /usr/local/
Navigated to /usr/local/confluent-5.5.0/etc/schema-registry and modified schema-registry.properties file to update the key kafkastore.connection.url with multiple zookeeper server host and port with comma separated value.
The value for the key kafkastore.bootstrap.servers can be used alternatively without Zookeeper by mentioning the host and port of all the Kafka broker in the cluster. The value of the next key kafkastore.topic was not updated and kept as default “compact“. The topic named compact would be used by the Schema Registry to store all the schemas and this topic would be created automatically in the Apache Kafka cluster when start the Schema Registry server for the first time.
To run the Schema Registry, navigate to the bin directory under confluent-5.5.0 and execute the script “schema-registry-start” with the location of the schema-registry.properties as a parameter.
and eventually, Schema Registry will start with the following messages in the same console/terminal.
To make sure Confluent Schema Registry is up and running with RESTful interface, we can hit the following URL from the browser and get the response as the HTTP 200 OK.
http://<IP Address of the node where Schema Registry Installed>:8081/subjects
We can install the REST client browser plug-in to execute GET requests to save time depending upon the type of browser choice. Since I used Firefox Mozilla, plugged in “RESTED”(https://addons.mozilla.org/en-US/firefox/addon/rested/) as a Firefox extension for a REST client. Similarly, for the Google Chrome browser, Advanced REST Client can be used.
Confluent Schema Registry’s RESTFul interface can be leveraged to store and retrieve AVRO, JSON Schema, and Protobuf Schemas. Here I considered JSON Schema and subsequently created or store few new JSON Schema using terminal or CLI on the Schema Registry. As a simple example, one Order Details JSON Schema has been created and stored in Schema Registry under subject Orders. To achieved, followed the following steps
and subsequently reformatted with the escape character.
{\”type\”:\”record\”,\”name\”:\”Order_Details\”,\”namespace\”:\”dataview.in\”,\”fields\”:[{\”name\”:\”id\”,\”type\”:\”string\”},{\”name\”:\”amount\”,\”type\”:\”double\”},{\”name\”:\”payment_type\”,\”type\”:\”string\”}, {\”name\”:\”customer_email\”,\”type\”:\”string\”}]}
Many free online tools are available like https://www.freeformatter.com/json-formatter.html for JSON formatting, JSON String escapes, etc to execute the above.
curl -X POST -H “Content-Type: application/vnd.schemaregistry.v1+json” –data ‘{“schema”: “{\”type \”:\”record\”,\”name\”:\”Order_Details\”,\”namespace\”:\”dataview.in\”,\”fields\”:[{\”name\”:\”id\”,\”type\”:\”string\”},{\”name\”:\”amount\”,\”type\”:\”double\”},{\”name\”:\”payment_type\”,\”type\”:\”string\”}, {\”name\”:\”customer_email\”,\”type\”:\”string\”}]}“}’ http://<IP Address of node where Schema Registry is running>:8081//subjects/Orders/versions
Note:- Order Details schema stored under the subject Orders, might have multiple version with id if Order Details Schema gets updated later with new fields or due to other modification.
As mentioned in step 3, we installed/plugged in RESTED (REST Client) on the Firefox browser and hit the URL to verify four basic API usage through RESTful interface. Same can be done through CLI or from terminal.
and following command can be used on the terminal to get the same response instantly
$ curl -X GET http://< IP Address of node where Schema Registry is running>:8081/subjects
Similarly from CLI or Terminal
$ curl -X GET http://< IP Address of node where Schema Registry is running>:8081/config
$ curl -X GET http:// < IP Address of node where Schema Registry is running>:8081/subjects/Orders/versions/latest
Since we have newly registered Order details under subject Order and not done any changes or modification on top of it so returning only 1 version.
$ curl -X GET http://< IP Address of node where Schema Registry is running>:8081/subjects/Orders/versions
Expectation you have appreciated this perused. Please like and share if you feel this composed is valuable.
Reference :- https://docs.confluent.io/current/schema-registry/docs/index.html
Can be reached for real-time POC development and hands-on technical training at gautambangalore@gmail.com. Besides, to design, develop just as help in any Hadoop/Big Data handling related task. Gautam is a advisor and furthermore an Educator as well. Before that, he filled in as Sr. Technical Architect in different technologies and business space across numerous nations.
He is energetic about sharing information through blogs, preparing workshops on different Big Data related innovations, systems and related technologies.
Page: 1 2
A data fabric is an innovative system designed to seamlessly integrate and organize data from… Read More
Big data technologies' quick development has brought attention to the necessity of a smooth transition… Read More
Data is being generated from various sources, including electronic devices, machines, and social media, across… Read More
An Apache Kafka outage occurs when a Kafka cluster or some of its components fail,… Read More
Complex event processing (CEP) is a highly effective and optimized mechanism that combines several sources… Read More
Source:- www.PacktPub.com This book focuses on data science, a rapidly expanding field of study and… Read More