Home

Proof of concept to analyse huge application log files using Hadoop cluster on IBM Cloud Platform

Analysing the application log files generated on production environment are very challenging. Data in the log files are in unstructured format and hence to leverage the query functionality, they can’t be stored in RDBMS/traditional database systems without conversion to structured format. Hence if an application behaves abruptly for very short duration, troubleshooting the application based on the information recorded in a large log file, probably of size hundreds of terabytes, is nearly impossible.As part of our POC development, we found that from an E-Commerce application running on Oracle Web Commerce platform (ATG), sometimes for order fulfilment asynchronous communication was not established to a third party vendor. JMS messaging protocol was responsible to delivered the order submission message from ATG third party vendor and vice versa, but periodically it was unable to do that. Using Hadoop cluster with customized Map-Reduce programming model, we extracted the exact recorded warnings and errors from log files produced from out of box ATG component. After performing the intricate analysis within the framework component, based on the analysed reports produced by Hadoop framework, we concluded that the issue was lying within the ATG framework itself. The same was communicated to the software vendor and subsequently received the patch from them.

Recent Posts

Transferring real-time data processed within Apache Flink to Kafka

Transferring real-time data processed within Apache Flink to Kafka and ultimately to Druid for analysis/decision-making.… Read More

4 weeks ago

Streaming real-time data from Kafka 3.7.0 to Flink 1.18.1 for processing

Over the past few years, Apache Kafka has emerged as the leading standard for streaming… Read More

2 months ago

Why Apache Kafka and Apache Flink work incredibly well together to boost real-time data analytics

When data is analyzed and processed in real-time, it can yield insights and actionable information… Read More

3 months ago

Integrating rate-limiting and backpressure strategies synergistically to handle and alleviate consumer lag in Apache Kafka

Apache Kafka stands as a robust distributed streaming platform. However, like any system, it is… Read More

4 months ago

Leveraging Apache Kafka for the Distribution of Large Messages (in gigabyte size range)

In today's data-driven world, the capability to transport and circulate large amounts of data, especially… Read More

5 months ago

The Zero Copy principle subtly encourages Apache Kafka to be more efficient.

The Apache Kafka, a distributed event streaming technology, can process trillions of events each day… Read More

6 months ago