Tech Threads

Coupling Schema Registry (Confluent) with Multi-Broker Apache Kafka Cluster


This article aims to explain the steps to coupling Confluent Schema Registry with existed/operational multi-broker Apache Kafka cluster(Local deployment). The Confluent is an integrated platform bundle with Apache Kafka and multiple different components starting from ksqlDB for stream processing, numerous connectors (Database, File, AWS, Azure, Google, etc), Schema Registry, Control Center, etc. Please click here to know more about the Confluent Platform.

In short, Schema Registry preserves a versioned history of all schemas, provides multiple compatibility settings, allows the evolution of schemas, etc. It supports Avro, JSON Schema, and Protobuf schemas. Can read here about the importance of Schema Registry on Kafka Based Data Pipelines

NOTE: The Schema Registry integration for Kafka is not part of the Open Source Apache Kafka ecosystem. Can execute this locally by downloading the prebuilt versions of the schema registry as part of the Confluent Platform or by building a development version with Maven. The source code in GitHub is available at https://github.com/confluentinc/schema-registry under Confluent Community License.

Article Structure

This article has segmented into five parts:

  1. As a beginning, I will start with the assumption on the operational multi-broker Kafka cluster
  2. Download and install the Confluent platform
  3. Independent configuration and verify Schema Registry
  4. Posting or Registering new version of JSON schemas through CLI/Terminal
  5. Few API usages on Schema Registry’s built-in RESTful interface through browser plug-in
  1. Assumptions:

Here I am considering four nodes in the cluster and each one is already installed and running Kafka of version 2.6.0 with Zookeeper (V 3.5.6) on top of OS Ubuntu 14.04 LTS and java version “1.8.0_101”. Besides, configured four brokers with two topics and each topic with three partitions.

Note:- Confluent Schema Registry can be installed and run outside of the Apache Kafka cluster. Due to hardware limitation to append another node for Schema Registry in the Kafka cluster, I have selected a healthy node in the existing Kafka cluster that having 16GB RAM and 1 TB HD for Schema Registry to run.

  1. Download and Install Confluent platform

Here we will be integrating only Schema Registry available inside the Confluent platform with the existing/operational Apache Kafka cluster even though the Confluent platform accommodates Kafka, Zookeeper, KSqlDB, Schema Registry etc. Downloaded prebuilt version confluent-community-5.5.0-2.12.tar from here under Confluent Community License(https://www.confluent.io/confluent-community-license-faq/). This procedure is not recommended for commercial/ production use without a valid license from Confluent. You can read here in detail about Confluent Licenses. Besides, can visit https://github.com/confluentinc/schema-registry for the source and build subsequently.

  1. Independent configuration and verify/run Schema Registry

As mentioned in assumption, copy and extract/untar the confluent-community-5.5.0-2.12.tar with root privilege under /usr/local/

Navigated to /usr/local/confluent-5.5.0/etc/schema-registry and modified schema-registry.properties file to update the key kafkastore.connection.url with multiple zookeeper server host and port with comma separated value.

The value for the key kafkastore.bootstrap.servers can be used alternatively without Zookeeper by mentioning the host and port of all the Kafka broker in the cluster. The value of the next key kafkastore.topic was not updated and kept as default “compact“. The topic named compact would be used by the Schema Registry to store all the schemas and this topic would be created automatically in the Apache Kafka cluster when start the Schema Registry server for the first time.

To run the Schema Registry, navigate to the bin directory under confluent-5.5.0 and execute the script “schema-registry-start” with the location of the schema-registry.properties as a parameter.

and eventually, Schema Registry will start with the following messages in the same console/terminal.

To make sure Confluent Schema Registry is up and running with RESTful interface, we can hit the following URL from the browser and get the response as the HTTP 200 OK.

http://<IP Address of the node where Schema Registry Installed>:8081/subjects

We can install the REST client browser plug-in to execute GET requests to save time depending upon the type of browser choice. Since I used Firefox Mozilla, plugged in “RESTED”(https://addons.mozilla.org/en-US/firefox/addon/rested/) as a Firefox extension for a REST client. Similarly, for the Google Chrome browser, Advanced REST Client can be used.

  1. Posting or Registering new version of JSON schemas through CLI/Terminal

Confluent Schema Registry’s RESTFul interface can be leveraged to store and retrieve AVRO, JSON Schema, and Protobuf Schemas. Here I considered JSON Schema and subsequently created or store few new JSON Schema using terminal or CLI on the Schema Registry. As a simple example, one Order Details JSON Schema has been created and stored in Schema Registry under subject Orders. To achieved, followed the following steps

  • Designed a simple/dummy Order Detail JSON as below

and subsequently reformatted with the escape character.

{\”type\”:\”record\”,\”name\”:\”Order_Details\”,\”namespace\”:\”dataview.in\”,\”fields\”:[{\”name\”:\”id\”,\”type\”:\”string\”},{\”name\”:\”amount\”,\”type\”:\”double\”},{\”name\”:\”payment_type\”,\”type\”:\”string\”}, {\”name\”:\”customer_email\”,\”type\”:\”string\”}]}

Many free online tools are available like https://www.freeformatter.com/json-formatter.html for JSON formatting, JSON String escapes, etc to execute the above.

  • ‘{“schema”: “”}’ is the template to store JSON Schema inside Schema Registry. Inside double quotes (“”) , the Order Details JSON appended.'{“schema”: “{\”type\”:\”record\”,\”name\”:\”Order_Details\”,\”namespace\”:\”dataview.in\”,\”fields\”:[{\”name\”:\”id\”,\”type\”:\”string\”},{\”name\”:\”amount\”,\”type\”:\”double\”},{\”name\”:\”payment_type\”,\”type\”:\”string\”}, {\”name\”:\”customer_email\”,\”type\”:\”string\”}]}“}’
  • Here is the complete command that posted from the CLI/terminal to Confluent Schema Registry to store a new JSON Schema . If successful, schema id would be returned and displayed.

curl -X POST -H “Content-Type: application/vnd.schemaregistry.v1+json” –data ‘{“schema”: “{\”type \”:\”record\”,\”name\”:\”Order_Details\”,\”namespace\”:\”dataview.in\”,\”fields\”:[{\”name\”:\”id\”,\”type\”:\”string\”},{\”name\”:\”amount\”,\”type\”:\”double\”},{\”name\”:\”payment_type\”,\”type\”:\”string\”}, {\”name\”:\”customer_email\”,\”type\”:\”string\”}]}“}’ http://<IP Address of node where Schema Registry is running>:8081//subjects/Orders/versions

Note:- Order Details schema stored under the subject Orders, might have multiple version with id if Order Details Schema gets updated later with new fields or due to other modification.

  1. Few API usages on Schema Registry’s built-in RESTful interface through browser plug-in

As mentioned in step 3, we installed/plugged in RESTED (REST Client) on the Firefox browser and hit the URL to verify four basic API usage through RESTful interface. Same can be done through CLI or from terminal.

  • List all the subjects

and following command can be used on the terminal to get the same response instantly

$ curl -X GET http://< IP Address of node where Schema Registry is running>:8081/subjects

  • Get or display top level config

Similarly from CLI or Terminal

$ curl -X GET http://< IP Address of node where Schema Registry is running>:8081/config

  • Fetch the most recently registered schema under subject “Order”

$ curl -X GET http:// < IP Address of node where Schema Registry is running>:8081/subjects/Orders/versions/latest

  • List or get how many version of schema registered under the subject “Orders”

Since we have newly registered Order details under subject Order and not done any changes or modification on top of it so returning only 1 version.

$ curl -X GET http://< IP Address of node where Schema Registry is running>:8081/subjects/Orders/versions

Expectation you have appreciated this perused. Please like and share if you feel this composed is valuable.

Reference :- https://docs.confluent.io/current/schema-registry/docs/index.html


Written by
Gautam Goswami

Can be reached for real-time POC development and hands-on technical training at gautambangalore@gmail.com. Besides, to design, develop just as help in any Hadoop/Big Data handling related task. Gautam is a advisor and furthermore an Educator as well. Before that, he filled in as Sr. Technical Architect in different technologies and business space across numerous nations.
He is energetic about sharing information through blogs, preparing workshops on different Big Data related innovations, systems and related technologies.

Page: 1 2

Recent Posts

Transferring real-time data processed within Apache Flink to Kafka

Transferring real-time data processed within Apache Flink to Kafka and ultimately to Druid for analysis/decision-making.… Read More

3 weeks ago

Streaming real-time data from Kafka 3.7.0 to Flink 1.18.1 for processing

Over the past few years, Apache Kafka has emerged as the leading standard for streaming… Read More

2 months ago

Why Apache Kafka and Apache Flink work incredibly well together to boost real-time data analytics

When data is analyzed and processed in real-time, it can yield insights and actionable information… Read More

3 months ago

Integrating rate-limiting and backpressure strategies synergistically to handle and alleviate consumer lag in Apache Kafka

Apache Kafka stands as a robust distributed streaming platform. However, like any system, it is… Read More

3 months ago

Leveraging Apache Kafka for the Distribution of Large Messages (in gigabyte size range)

In today's data-driven world, the capability to transport and circulate large amounts of data, especially… Read More

5 months ago

The Zero Copy principle subtly encourages Apache Kafka to be more efficient.

The Apache Kafka, a distributed event streaming technology, can process trillions of events each day… Read More

6 months ago