Due to the exponential growth of digitalization, the entire globe is creating minimum 2.5 quintillion 2500000000000 Million) bytes of data every day and that we can denote as Big Data. Data generation is happening from everywhere starting from social media sites, various sensors, satellite, purchase transaction, Mobile, GPS signals and much more. With the advancement of technology, there is no sign of slowing down of data generation, instead it will grow in massive volume. All the major organizations, retailers, different vertical companies and enterprise products have started focusing on leveraging big data technologies to produce actionable insights, business expansion, growth etc.
– In Data ingestion or consumption layer, we can include Apache Kafka, Flume etc which are responsible for gathering data from various/multiple sources. Based on the requirement to process data either on batches, live streaming or combination of both, bifurcation takes place here like Lambda sign(λ).
– In Batch layer, all the data accumulate at once before running any computation on top of it. Here we can achieve fault-tolerance and replication to prevent any data loss. Hadoop Distributed File System (HDFS) can be considered in this layer.
Page: 1 2
Data is being generated from various sources, including electronic devices, machines, and social media, across… Read More
An Apache Kafka outage occurs when a Kafka cluster or some of its components fail,… Read More
Complex event processing (CEP) is a highly effective and optimized mechanism that combines several sources… Read More
Source:- www.PacktPub.com This book focuses on data science, a rapidly expanding field of study and… Read More
Over the past few years, Apache Kafka has emerged as the top event streaming platform… Read More
In the current fast-paced digital age, many data sources generate an unending flow of information,… Read More