Integrating rate-limiting and backpressure strategies synergistically to handle and alleviate consumer lag in Apache Kafka

Apache Kafka stands as a robust distributed streaming platform. However, like any system, it is imperative to proficiently oversee and control latency for optimal performance. Kafka Consumer Lag refers to […]

Leveraging Apache Kafka for the Distribution of Large Messages (in gigabyte size range)

In today’s data-driven world, the capability to transport and circulate large amounts of data, especially video files, in real-time is crucial for news media companies. For example, an incident occurred […]

The Zero Copy principle subtly encourages Apache Kafka to be more efficient.

The Apache Kafka, a distributed event streaming technology, can process trillions of events each day and eventually demonstrate its tremendous throughput and low latency. That’s building trust and over 80% […]

Understanding of Supervisor and it’s specification in Apache Druid for real-time data ingestion from Apache Kafka

Although both Apache Druid and Apache Kafka are potent open-source data processing tools, they have diverse uses. While Druid is a high-performance, column-store, real-time analytical database, Kafka is a distributed […]

Causes and remedies of poison pill in Apache Kafka

A poison pill is a message deliberately sent to a Kafka topic, designed to consistently fail when consumed, regardless of the number of consumption attempts. Poison Pill scenarios are frequently […]

Apache Kafka’s built-in command line tools – A hidden gem to scan internals.

Several tools/scripts are included in the bin directory of the Apache Kafka binary installation. Even if that directory has a number of scripts, through this article I want to highlight […]

The significance of deep storage in Apache Druid

The phrase “deep storage” refers to the long-term storage system used by Apache Druid, where past data segments are preserved for durability and retrieval in the future. Druid stores data […]

Forging Apache Druid with Apache Kafka for real-time streaming analytics

A real-time analytics database called Apache Druid is developed for quick slice-and-dice analysis on massive data volumes. The best data for Apache Druid is event-oriented and frequently utilized as the […]

Knowing and valuing Apache Kafka’s ISR (In-Sync Replicas)

To get more clarity about ISR in Apache Kafka, We should first carefully examine the replication process in the Kafka broker. In short, replication means having multiple copies of our […]

Handling bad messages via DLQ by configuring JDBC Kafka Sink Connector

Any trustworthy data streaming pipeline needs to be able to identify and handle faults. Exceptionally while IoT devices ingest endlessly critical data/events into permanent persistence storage like RDBMS for future […]