Tech Threads

Basic concept on Data Lake

The info graphics representing the basic concept of Data Lake where we can use the approach ELT (Extraction, loading and then transformation) against traditional ETL (Extraction, Transformation and then loading)process. ETL process implies to traditional data warehousing system where structured data format follows (raw and column).By leveraging HDFS (Hadoop Distributed File System), we can develop data lake to store any format data in order to process and analysis. Directly data can be loaded in the Lake without transformation, later transformation can be performed on demand basis. Data Lake concept is offering a tremendous advantage and benefit.

Huge volume of data can be stored in a distributed manner.
Format of data is not a criteria in Data Lake. Any data format can be stored like structured, Semi-Structured and Unstructured.
Semi-Structured and Unstructured data can be stored in traditional data warehousing system. Pre-processing steps mandatory to convert into Structured data format before loading. These steps are very expensive and time-consuming and chances of data loss/corrupt highly visible.
Commodity hardware can be utilized to create/develop a Data Lake. Besides, it’s fault tolerant.

Written by
Gautam Goswami

Can be reached for real-time POC development and hands-on technical training at gautambangalore@gmail.com. Besides, to design, develop just as help in any Hadoop/Big Data handling related task. Gautam is a advisor and furthermore an Educator as well. Before that, he filled in as Sr. Technical Architect in different technologies and business space across numerous nations.
He is energetic about sharing information through blogs, preparing workshops on different Big Data related innovations, systems and related technologies.

Page: 1 2

Next Fog Computing »

Previous « Real time data analytics helps mobile service providers to achieve aggressive advantages

AI on the Fly: Real-Time Data Streaming from Apache Kafka To Live Dashboards

In the current fast-paced digital age, many data sources generate an unending flow of information,… Read More

4 days ago

Tech Threads

Real-Time at Sea: Harnessing Data Stream Processing to Power Smarter Maritime Logistics

According to the International Chamber of Shipping, the maritime industry has increased fourfold in the… Read More

3 weeks ago

Tech Threads

Driving Streaming Intelligence On-Premises: Real-Time ML with Apache Kafka and Flink

Lately, companies, in their efforts to engage in real-time decision-making by exploiting big data, have… Read More

2 months ago

Tech Threads

Dark Data Demystified: The Role of Apache Iceberg

Lurking in the shadows of every organization is a silent giant—dark data. Undiscovered log files,… Read More

3 months ago

Tech Threads

The Role of Materialized Views in Modern Data Stream Processing Architectures + RisingWave

Incremental computation in data streaming means updating results as fresh data comes in, without redoing… Read More

6 months ago

Tech Threads

Unlocking the Power of Patterns in Event Stream Processing (ESP): The Critical Role of Apache Flink’s FlinkCEP Library

We call this an event when a button is pressed, a sensor detects a temperature… Read More