Hive Data loading

Alternative way of loading or importing data into Hive tables running on top of HDFS based data lake.

Preceding pen down the article, might want to stretch out appreciation to all the wellbeing teams beginning from cleaning/sterile group to Nurses, Doctors and other who are consistently battling to […]

Kafka Zookeeper disintegration

Why disintegration of Apache Zookeeper from Kafka is in pipeline

  The main objective of this article is to highlight why to cut the bridge between Apache Zookeeper and Kafka which is an upcoming project from the Apache software foundation. […]

Installation of Apache Hadoop 3.2.0

Apache Hadoop 3.2.0 has been released after incorporating many outstanding enhancements over the previous stable release. The objective of this article is to explain step by step installation of Apache […]

Manual procedure to add a new Datanode into an existing basic data lake without Apache Ambari or Cloudera Manager. Constructed using HDFS (Hadoop Distributed File System) on the multi-node cluster

   The aim of this article is to highlight the essential steps when there would be a need for a new DataNode into an exiting multi-node Hadoop cluster. Midsize or […]

FAULT TOLERANCE ENHANCEMENT ON APACHE HADOOP 3.0.0

Fault tolerance enhancement on Apache Hadoop 3.0.0-alpha2 by supporting more than 2 NameNodes.

NameNode is the most critical resource in Hadoop core cluster. Once very large files loaded into the Hadoop Distributed File System (HDFS), the files get broken into block-sized chunks as […]

mapper

Steering number of mapper (MapReduce) in sqoop for parallelism of data ingestion into Hadoop Distributed File System (HDFS)

To import data from most the data source like RDBMS, SQOOP internally use mapper. Before delegating the responsibility to the mapper, sqoop performs few initial operations in a sequence once […]

Transfer structured data from Oracle to Hadoop storage system

Using Apache Sqoop, we can transfer structured data from Relational Database Management System to Hadoop distributed file system (HDFS). Because of distributed storage mechanism in Hadoop Distributed File System (HDFS), we […]

Data Lake

Basic concept on Data Lake

The info graphics representing the basic concept of Data Lake where we can use the approach ELT (Extraction, loading and then transformation) against traditional ETL (Extraction, Transformation and then loading)process. […]