Tech Threads

Apache Kafka, The next Generation Distributed Messaging System.

In Big Data project, the main challenge is to collect an enormous volume of data. We need a distributed high throughput messaging systems to overcome it. Apache Kafka is designed to address the challenge. It was originally developed at LinkedIn Corporation and later on became a part of Apache project. A Messaging System is typically responsible for transferring data from one application to another.


A message is nothing but the bunch of data/information. To ingest huge volume of data into Hadoop Distributed File System (HDFS), we need to have distributed messaging system that runs on a cluster of servers and Kafka is an excellent choice for it. Kafka is very easy to scale out and offer high throughput.
Apache Kafka supports multi-subscribers and automatically balances the consumers during failure. It supports multi-subscribers and automatically balances the consumers during failure. Besides, Kafka persists messages on systems disk and thus can be used for batched consumption of messages such as ETL (Extraction, Transformation and Loading). Please visit https://kafka.apache.org/ to know more.
Written by
Gautam Goswami    

Can be contacted for real time POC development and hands-on technical training. Also to develop/support any Hadoop related project. Email:- gautam@onlineguwahati.com. Gautam is a consultant as well as Educator. Prior to that, he worked as Sr. Technical Architect in multiple technologies and business domain. Currently, he is specializing in Big Data processing and analysis, Data lake creation, architecture etc. using HDFS. Besides, involved in HDFS maintenance and loading of multiple types of data from different sources, Design and development of real time use case development on client/customer demands to demonstrate how data can be leveraged for business transformation, profitability etc. He is passionate about sharing knowledge through blogs, seminars, presentations etc. on various Big Data related technologies, methodologies, real time projects with their architecture /design, multiple procedure of huge volume data ingestion, basic data lake creation etc.

AddThis Website Tools

Recent Posts

AI on the Fly: Real-Time Data Streaming from Apache Kafka To Live Dashboards

In the current fast-paced digital age, many data sources generate an unending flow of information,… Read More

2 days ago

Real-Time at Sea: Harnessing Data Stream Processing to Power Smarter Maritime Logistics

According to the International Chamber of Shipping, the maritime industry has increased fourfold in the… Read More

3 weeks ago

Driving Streaming Intelligence On-Premises: Real-Time ML with Apache Kafka and Flink

Lately, companies, in their efforts to engage in real-time decision-making by exploiting big data, have… Read More

2 months ago

Dark Data Demystified: The Role of Apache Iceberg

Lurking in the shadows of every organization is a silent giant—dark data. Undiscovered log files,… Read More

3 months ago

The Role of Materialized Views in Modern Data Stream Processing Architectures + RisingWave

Incremental computation in data streaming means updating results as fresh data comes in, without redoing… Read More

6 months ago

Unlocking the Power of Patterns in Event Stream Processing (ESP): The Critical Role of Apache Flink’s FlinkCEP Library

We call this an event when a button is pressed, a sensor detects a temperature… Read More

6 months ago