Handling bad messages via DLQ by configuring JDBC Kafka Sink Connector

Any trustworthy data streaming pipeline needs to be able to identify and handle faults. Exceptionally while IoT devices ingest endlessly critical data/events into permanent persistence storage like RDBMS for future […]

Data Streaming

Streaming Data from Files into Multi-Broker Kafka Cluster by anchoring Kafka Connect Framework

There are multiple ways to ingest data streams into Kafka topic and subsequently deliver to various types of consumers who are hooked to the topic. The stream of data that […]

Apache Zookeeper QuorumPeerMain

Resolve Apache Zookeeper starting issue installed on multi-node cluster

This miniature article explains how to resolve the error “Error: Could not find or load main class org.apache.zookeeper.server.quorum.QuorumPeerMain“ when we start the Apache Zookeeper (apache-zookeeper-3.5.6.tar.gz) installed on a multi-node cluster. […]

Hive-3.1.2

Install and Configuration of Apache Hive-3.1.2 on multi-node Hadoop-3.2.0 cluster with MySQL for Hive metastore

The apache Hive is a data warehouse system built on top of the  Apache Hadoop.  Hive can be utilized for easy data summarization, ad-hoc queries, analysis of large datasets stores […]

Hadoop-3.2.0 multi-node cluster

Apache Hadoop-3.2.0 installation on the multi-node cluster with an alternative backup recovery

The objective of this article is to explain how we can deploy the latest version of Apache Hadoop (Stable release: 3.2.0 / January 16, 2019)  on the multi-node cluster to store […]

Permission issue

Resolve permission issue among DataNodes with NameNode to establish Secure Shell /SSH without passphrase

Sometimes it has been observed that when we configure and deploy multi-node Hadoop cluster or add new DataNodes, there is a SSH permission issue in communication with Hadoop daemons. In […]

Installation of Apache Hadoop 3.2.0

Apache Hadoop 3.2.0 has been released after incorporating many outstanding enhancements over the previous stable release. The objective of this article is to explain step by step installation of Apache […]

Manual procedure to add a new Datanode into an existing basic data lake without Apache Ambari or Cloudera Manager. Constructed using HDFS (Hadoop Distributed File System) on the multi-node cluster

The aim of this article is to highlight the essential steps when there would be a need for a new DataNode into an exiting multi-node Hadoop cluster. Midsize or startup […]

Network Topology to create Multi Node Hybrid cluster for Hadoop Installation

The aim of this article is to provide an outline for creating network topology for Hadoop installation in multi node hybrid cluster with limited available hardware resources. This cluster would […]