Due to the exponential growth of digitalization, the entire globe is creating minimum 2.5 quintillion 2500000000000 Million) bytes of data every day and that we can denote as Big Data. Data generation is happening from everywhere starting from social media sites, various sensors, satellite, purchase transaction, Mobile, GPS signals and much more. With the advancement of technology, there is no sign of slowing down of data generation, instead it will grow in massive volume. All the major organizations, retailers, different vertical companies and enterprise products have started focusing on leveraging big data technologies to produce actionable insights, business expansion, growth etc.
– In Data ingestion or consumption layer, we can include Apache Kafka, Flume etc which are responsible for gathering data from various/multiple sources. Based on the requirement to process data either on batches, live streaming or combination of both, bifurcation takes place here like Lambda sign(λ).
– In Batch layer, all the data accumulate at once before running any computation on top of it. Here we can achieve fault-tolerance and replication to prevent any data loss. Hadoop Distributed File System (HDFS) can be considered in this layer.
– Streaming or Speed layer is responsible for processing live streaming data without any persistence of data in the storage area. Here processing of data takes place when it is in motion. This layer activates when data comes in or with specific short time interval and subsequently generates the real-time view which gets pushed to next layer (Servicing Layer)
– Eventually, in this layer called Serving Layer, we get combined results from both streaming layer and batch layer which can be effectively utilized to provide unified desired results. Here always we will get the update from the batch layer and streaming layer either periodically or in real-time.
Lambda Architecture is a pluggable architecture where to process on demand, we can plug-in and plug-out the various number of data generation sources.
Can be contacted for real time POC development and hands-on technical training. Also to develop/support any Hadoop related project. Email:- [email protected] Gautam is a consultant as well as Educator. Prior to that, he worked as Sr. Technical Architect in multiple technologies and business domain. Currently, he is specializing in Big Data processing and analysis, Data lake creation, architecture etc. using HDFS. Besides, involved in HDFS maintenance and loading of multiple types of data from different sources, Design and development of real time use case development on client/customer demands to demonstrate how data can be leveraged for business transformation, profitability etc. He is passionate about sharing knowledge through blogs, seminars, presentations etc. on various Big Data related technologies, methodologies, real time projects with their architecture /design, multiple procedure of huge volume data ingestion, basic data lake creation etc.