Tech Threads

Importance of unstructured data

In today’s world,  Internet plays a major factor to generate and propagate information from various sources. Social media, Email, What’sApp, E-News Paper  etc  are playing a crucial role on circulation followed by creation of information.  These type of information often include text and multimedia contents.  These information or data methodically can’t be persisted in database and it is referred as unstructured data.


Due to advancement in  technology,  70-80 % growing data is unstructured and increasing significantly over structured/semi structured data.   Typically we store structured data and semi-structured data (after process) in traditional database but unstructured data can’t. Most of the organizations in today’s tough business environment have started analyzing unstructured data to take better business decision. Hadoop distributed file systems (HDFS) is an excellent framework to store unstructured data.

Here in (HDFS)we do not have to design schema in row  column  structure  to accumulate data. Data can be ingested directly in HDFS and process eventually in a distributed manner to get desire result.  The unstructured data is closely associate with term call “BIG DATA” which is tuning the entire world of information towards a new direction and Hadoop is enabling  us to process thousand of petabyte of unstructured data.

Written by
Gautam Goswami

Page: 1 2

Recent Posts

The Significance of Complex Event Processing (CEP) with RisingWave for Delivering Accurate Business Decisions

Complex event processing (CEP) is a highly effective and optimized mechanism that combines several sources… Read More

3 months ago

Principle Of Data Science

Source:- www.PacktPub.com This book focuses on data science, a rapidly expanding field of study and… Read More

3 months ago

Integrating Apache Kafka in KRaft Mode with RisingWave for Event Streaming Analytics

Over the past few years, Apache Kafka has emerged as the top event streaming platform… Read More

3 months ago

Criticality in Data Stream Processing and a Few Effective Approaches

In the current fast-paced digital age, many data sources generate an unending flow of information,… Read More

4 months ago

Partitioning Hot and Cold Data Tier in Apache Kafka Cluster for Optimal Performance

At first, data tiering was a tactic used by storage systems to reduce data storage… Read More

5 months ago

Exploring Telemetry: Apache Kafka’s Role in Telemetry Data Management with OpenTelemetry as a Fulcrum

With the use of telemetry, data can be remotely measured and transmitted from multiple sources… Read More

6 months ago