Tech Threads

Coupling Schema Registry (Confluent) with Multi-Broker Apache Kafka Cluster


This article aims to explain the steps to coupling Confluent Schema Registry with existed/operational multi-broker Apache Kafka cluster(Local deployment). The Confluent is an integrated platform bundle with Apache Kafka and multiple different components starting from ksqlDB for stream processing, numerous connectors (Database, File, AWS, Azure, Google, etc), Schema Registry, Control Center, etc. Please click here to know more about the Confluent Platform.

In short, Schema Registry preserves a versioned history of all schemas, provides multiple compatibility settings, allows the evolution of schemas, etc. It supports Avro, JSON Schema, and Protobuf schemas. Can read here about the importance of Schema Registry on Kafka Based Data Pipelines

NOTE: The Schema Registry integration for Kafka is not part of the Open Source Apache Kafka ecosystem. Can execute this locally by downloading the prebuilt versions of the schema registry as part of the Confluent Platform or by building a development version with Maven. The source code in GitHub is available at https://github.com/confluentinc/schema-registry under Confluent Community License.

Article Structure

This article has segmented into five parts:

  1. As a beginning, I will start with the assumption on the operational multi-broker Kafka cluster
  2. Download and install the Confluent platform
  3. Independent configuration and verify Schema Registry
  4. Posting or Registering new version of JSON schemas through CLI/Terminal
  5. Few API usages on Schema Registry’s built-in RESTful interface through browser plug-in
  1. Assumptions:

Here I am considering four nodes in the cluster and each one is already installed and running Kafka of version 2.6.0 with Zookeeper (V 3.5.6) on top of OS Ubuntu 14.04 LTS and java version “1.8.0_101”. Besides, configured four brokers with two topics and each topic with three partitions.

Note:- Confluent Schema Registry can be installed and run outside of the Apache Kafka cluster. Due to hardware limitation to append another node for Schema Registry in the Kafka cluster, I have selected a healthy node in the existing Kafka cluster that having 16GB RAM and 1 TB HD for Schema Registry to run.

  1. Download and Install Confluent platform

Here we will be integrating only Schema Registry available inside the Confluent platform with the existing/operational Apache Kafka cluster even though the Confluent platform accommodates Kafka, Zookeeper, KSqlDB, Schema Registry etc. Downloaded prebuilt version confluent-community-5.5.0-2.12.tar from here under Confluent Community License(https://www.confluent.io/confluent-community-license-faq/). This procedure is not recommended for commercial/ production use without a valid license from Confluent. You can read here in detail about Confluent Licenses. Besides, can visit https://github.com/confluentinc/schema-registry for the source and build subsequently.

  1. Independent configuration and verify/run Schema Registry

As mentioned in assumption, copy and extract/untar the confluent-community-5.5.0-2.12.tar with root privilege under /usr/local/

Navigated to /usr/local/confluent-5.5.0/etc/schema-registry and modified schema-registry.properties file to update the key kafkastore.connection.url with multiple zookeeper server host and port with comma separated value.

The value for the key kafkastore.bootstrap.servers can be used alternatively without Zookeeper by mentioning the host and port of all the Kafka broker in the cluster. The value of the next key kafkastore.topic was not updated and kept as default “compact“. The topic named compact would be used by the Schema Registry to store all the schemas and this topic would be created automatically in the Apache Kafka cluster when start the Schema Registry server for the first time.

To run the Schema Registry, navigate to the bin directory under confluent-5.5.0 and execute the script “schema-registry-start” with the location of the schema-registry.properties as a parameter.

and eventually, Schema Registry will start with the following messages in the same console/terminal.

To make sure Confluent Schema Registry is up and running with RESTful interface, we can hit the following URL from the browser and get the response as the HTTP 200 OK.

http://<IP Address of the node where Schema Registry Installed>:8081/subjects

We can install the REST client browser plug-in to execute GET requests to save time depending upon the type of browser choice. Since I used Firefox Mozilla, plugged in “RESTED”(https://addons.mozilla.org/en-US/firefox/addon/rested/) as a Firefox extension for a REST client. Similarly, for the Google Chrome browser, Advanced REST Client can be used.

  1. Posting or Registering new version of JSON schemas through CLI/Terminal

Confluent Schema Registry’s RESTFul interface can be leveraged to store and retrieve AVRO, JSON Schema, and Protobuf Schemas. Here I considered JSON Schema and subsequently created or store few new JSON Schema using terminal or CLI on the Schema Registry. As a simple example, one Order Details JSON Schema has been created and stored in Schema Registry under subject Orders. To achieved, followed the following steps

  • Designed a simple/dummy Order Detail JSON as below

and subsequently reformatted with the escape character.

{\”type\”:\”record\”,\”name\”:\”Order_Details\”,\”namespace\”:\”dataview.in\”,\”fields\”:[{\”name\”:\”id\”,\”type\”:\”string\”},{\”name\”:\”amount\”,\”type\”:\”double\”},{\”name\”:\”payment_type\”,\”type\”:\”string\”}, {\”name\”:\”customer_email\”,\”type\”:\”string\”}]}

Many free online tools are available like https://www.freeformatter.com/json-formatter.html for JSON formatting, JSON String escapes, etc to execute the above.

  • ‘{“schema”: “”}’ is the template to store JSON Schema inside Schema Registry. Inside double quotes (“”) , the Order Details JSON appended.'{“schema”: “{\”type\”:\”record\”,\”name\”:\”Order_Details\”,\”namespace\”:\”dataview.in\”,\”fields\”:[{\”name\”:\”id\”,\”type\”:\”string\”},{\”name\”:\”amount\”,\”type\”:\”double\”},{\”name\”:\”payment_type\”,\”type\”:\”string\”}, {\”name\”:\”customer_email\”,\”type\”:\”string\”}]}“}’
  • Here is the complete command that posted from the CLI/terminal to Confluent Schema Registry to store a new JSON Schema . If successful, schema id would be returned and displayed.

curl -X POST -H “Content-Type: application/vnd.schemaregistry.v1+json” –data ‘{“schema”: “{\”type \”:\”record\”,\”name\”:\”Order_Details\”,\”namespace\”:\”dataview.in\”,\”fields\”:[{\”name\”:\”id\”,\”type\”:\”string\”},{\”name\”:\”amount\”,\”type\”:\”double\”},{\”name\”:\”payment_type\”,\”type\”:\”string\”}, {\”name\”:\”customer_email\”,\”type\”:\”string\”}]}“}’ http://<IP Address of node where Schema Registry is running>:8081//subjects/Orders/versions

Note:- Order Details schema stored under the subject Orders, might have multiple version with id if Order Details Schema gets updated later with new fields or due to other modification.

  1. Few API usages on Schema Registry’s built-in RESTful interface through browser plug-in

As mentioned in step 3, we installed/plugged in RESTED (REST Client) on the Firefox browser and hit the URL to verify four basic API usage through RESTful interface. Same can be done through CLI or from terminal.

  • List all the subjects

and following command can be used on the terminal to get the same response instantly

$ curl -X GET http://< IP Address of node where Schema Registry is running>:8081/subjects

  • Get or display top level config

Similarly from CLI or Terminal

$ curl -X GET http://< IP Address of node where Schema Registry is running>:8081/config

  • Fetch the most recently registered schema under subject “Order”

$ curl -X GET http:// < IP Address of node where Schema Registry is running>:8081/subjects/Orders/versions/latest

  • List or get how many version of schema registered under the subject “Orders”

Since we have newly registered Order details under subject Order and not done any changes or modification on top of it so returning only 1 version.

$ curl -X GET http://< IP Address of node where Schema Registry is running>:8081/subjects/Orders/versions

Expectation you have appreciated this perused. Please like and share if you feel this composed is valuable.

Reference :- https://docs.confluent.io/current/schema-registry/docs/index.html


Written by
Gautam Goswami

Can be reached for real-time POC development and hands-on technical training at gautambangalore@gmail.com. Besides, to design, develop just as help in any Hadoop/Big Data handling related task. Gautam is a advisor and furthermore an Educator as well. Before that, he filled in as Sr. Technical Architect in different technologies and business space across numerous nations.
He is energetic about sharing information through blogs, preparing workshops on different Big Data related innovations, systems and related technologies.

Page: 1 2

Recent Posts

Revolutionize Stream Processing with the Power of Data Fabric

A data fabric is an innovative system designed to seamlessly integrate and organize data from… Read More

3 weeks ago

Bridging the Gap: Unlocking the Power of HDFS-Based Data Lakes with Streaming Databases

Big data technologies' quick development has brought attention to the necessity of a smooth transition… Read More

4 weeks ago

Which Flow Is Best for Your Data Needs: Time Series vs. Streaming Databases

Data is being generated from various sources, including electronic devices, machines, and social media, across… Read More

1 month ago

Protecting Your Data Pipeline: Avoid Apache Kafka Outages

An Apache Kafka outage occurs when a Kafka cluster or some of its components fail,… Read More

2 months ago

The Significance of Complex Event Processing (CEP) with RisingWave for Delivering Accurate Business Decisions

Complex event processing (CEP) is a highly effective and optimized mechanism that combines several sources… Read More

5 months ago

Principle Of Data Science

Source:- www.PacktPub.com This book focuses on data science, a rapidly expanding field of study and… Read More

5 months ago