Streaming real-time data from Kafka 3.7.0 to Flink 1.18.1 for processing

Over the past few years, Apache Kafka has emerged as the leading standard for streaming data. Fast-forward to the present day, Kafka has achieved ubiquity, being adopted by at least […]

Why Apache Kafka and Apache Flink work incredibly well together to boost real-time data analytics

When data is analyzed and processed in real-time, it can yield insights and actionable information either instantly or with very little delay from the time the data is collected. The […]

Integrating rate-limiting and backpressure strategies synergistically to handle and alleviate consumer lag in Apache Kafka

Apache Kafka stands as a robust distributed streaming platform. However, like any system, it is imperative to proficiently oversee and control latency for optimal performance. Kafka Consumer Lag refers to […]

Leveraging Apache Kafka for the Distribution of Large Messages (in gigabyte size range)

In today’s data-driven world, the capability to transport and circulate large amounts of data, especially video files, in real-time is crucial for news media companies. For example, an incident occurred […]

The Zero Copy principle subtly encourages Apache Kafka to be more efficient.

The Apache Kafka, a distributed event streaming technology, can process trillions of events each day and eventually demonstrate its tremendous throughput and low latency. That’s building trust and over 80% […]

Understanding of Supervisor and it’s specification in Apache Druid for real-time data ingestion from Apache Kafka

Although both Apache Druid and Apache Kafka are potent open-source data processing tools, they have diverse uses. While Druid is a high-performance, column-store, real-time analytical database, Kafka is a distributed […]

Causes and remedies of poison pill in Apache Kafka

A poison pill is a message deliberately sent to a Kafka topic, designed to consistently fail when consumed, regardless of the number of consumption attempts. Poison Pill scenarios are frequently […]

Apache Kafka’s built-in command line tools – A hidden gem to scan internals.

Several tools/scripts are included in the bin directory of the Apache Kafka binary installation. Even if that directory has a number of scripts, through this article I want to highlight […]

The significance of deep storage in Apache Druid

The phrase “deep storage” refers to the long-term storage system used by Apache Druid, where past data segments are preserved for durability and retrieval in the future. Druid stores data […]

Forging Apache Druid with Apache Kafka for real-time streaming analytics

A real-time analytics database called Apache Druid is developed for quick slice-and-dice analysis on massive data volumes. The best data for Apache Druid is event-oriented and frequently utilized as the […]