Categories: Tech Threads

Manual procedure to add a new Datanode into an existing basic data lake without Apache Ambari or Cloudera Manager. Constructed using HDFS (Hadoop Distributed File System) on the multi-node cluster

Written by
Gautam Goswami    

Can be contacted for real time POC development and hands-on technical training. Also to develop/support any Hadoop related project. Email:- gautam@onlineguwahati.com. Gautam is a consultant as well as Educator. Prior to that, he worked as Sr. Technical Architect in multiple technologies and business domain. Currently, he is specializing in Big Data processing and analysis, Data lake creation, architecture etc. using HDFS. Besides, involved in HDFS maintenance and loading of multiple types of data from different sources, Design and development of real time use case development on client/customer demands to demonstrate how data can be leveraged for business transformation, profitability etc. He is passionate about sharing knowledge through blogs, seminars, presentations etc. on various Big Data related technologies, methodologies, real time projects with their architecture /design, multiple procedure of huge volume data ingestion, basic data lake creation etc.

Page: 1 2

Recent Posts

Understanding of Supervisor and it’s specification in Apache Druid for real-time data ingestion from Apache Kafka

Although both Apache Druid and Apache Kafka are potent open-source data processing tools, they have… Read More

7 months ago

Causes and remedies of poison pill in Apache Kafka

A poison pill is a message deliberately sent to a Kafka topic, designed to consistently… Read More

8 months ago

Apache Kafka’s built-in command line tools – A hidden gem to scan internals.

Several tools/scripts are included in the bin directory of the Apache Kafka binary installation. Even… Read More

9 months ago

The significance of deep storage in Apache Druid

The phrase "deep storage" refers to the long-term storage system used by Apache Druid, where… Read More

10 months ago

Forging Apache Druid with Apache Kafka for real-time streaming analytics

A real-time analytics database called Apache Druid is developed for quick slice-and-dice analysis on massive… Read More

11 months ago

Knowing and valuing Apache Kafka’s ISR (In-Sync Replicas)

To get more clarity about ISR in Apache Kafka, We should first carefully examine the… Read More

12 months ago