HDFS Archives » DataView

0
0

Posted December 24, 2024 In Tech Threads

Bridging the Gap: Unlocking the Power of HDFS-Based Data Lakes with Streaming Databases

Big data technologies’ quick development has brought attention to the necessity of a smooth transition between real-time data analytics and batch processing systems. Since HDFS (Hadoop Distributed File System) based […]

0
0

Posted December 19, 2023 In Tech Threads

Leveraging Apache Kafka for the Distribution of Large Messages (in gigabyte size range)

In today’s data-driven world, the capability to transport and circulate large amounts of data, especially video files, in real-time is crucial for news media companies. For example, an incident occurred […]

1
4

Posted July 6, 2023 In Tech Threads

The significance of deep storage in Apache Druid

The phrase “deep storage” refers to the long-term storage system used by Apache Druid, where past data segments are preserved for durability and retrieval in the future. Druid stores data […]

0
3

Posted July 26, 2022 In Tech Threads

Why Kappa Architecture for processing of streaming data. Have competence to superseding Lambda Architecture?

Data is quickly becoming the new currency of the digital economy, but it is useless if it can’t be processed. The processing of data is essential for subsequent decision-making or […]

0
0

Posted March 30, 2022 In Tech Threads

The Lakehouse: An uplift of Data Warehouse Architecture

In short, the initial architecture of the data warehouse was designed to provide analytical insights by collecting data from various heterogeneous data sources into the centralized repository and acted as […]

0
0

Posted August 4, 2020 In Tech Threads

Alternative way of loading or importing data into Hive tables running on top of HDFS based data lake.

Preceding pen down the article, might want to stretch out appreciation to all the wellbeing teams beginning from cleaning/sterile group to Nurses, Doctors and other who are consistently battling to […]

0
0

Posted January 7, 2020 In Tech Threads

Why disintegration of Apache Zookeeper from Kafka is in pipeline

The main objective of this article is to highlight why to cut the bridge between Apache Zookeeper and Kafka which is an upcoming project from the Apache software foundation. […]

2
4

Posted August 1, 2019 In Tech Threads

Installation of Apache Hadoop 3.2.0

Apache Hadoop 3.2.0 has been released after incorporating many outstanding enhancements over the previous stable release. The objective of this article is to explain step by step installation of Apache […]

0
4

Posted May 30, 2019 In Tech Threads

Manual procedure to add a new Datanode into an existing basic data lake without Apache Ambari or Cloudera Manager. Constructed using HDFS (Hadoop Distributed File System) on the multi-node cluster

The aim of this article is to highlight the essential steps when there would be a need for a new DataNode into an exiting multi-node Hadoop cluster. Midsize or startup […]

FAULT TOLERANCE ENHANCEMENT ON APACHE HADOOP 3.0.0

0
2

Posted December 13, 2017 In Tech Threads

Fault tolerance enhancement on Apache Hadoop 3.0.0-alpha2 by supporting more than 2 NameNodes.

NameNode is the most critical resource in Hadoop core cluster. Once very large files loaded into the Hadoop Distributed File System (HDFS), the files get broken into block-sized chunks as […]

Bridging the Gap: Unlocking the Power of HDFS-Based Data Lakes with Streaming Databases

Leveraging Apache Kafka for the Distribution of Large Messages (in gigabyte size range)

The significance of deep storage in Apache Druid

Why Kappa Architecture for processing of streaming data. Have competence to superseding Lambda Architecture?

The Lakehouse: An uplift of Data Warehouse Architecture

Alternative way of loading or importing data into Hive tables running on top of HDFS based data lake.

Why disintegration of Apache Zookeeper from Kafka is in pipeline

Installation of Apache Hadoop 3.2.0

Manual procedure to add a new Datanode into an existing basic data lake without Apache Ambari or Cloudera Manager. Constructed using HDFS (Hadoop Distributed File System) on the multi-node cluster

Fault tolerance enhancement on Apache Hadoop 3.0.0-alpha2 by supporting more than 2 NameNodes.

Important Links

Important Links

Tags