Tech Threads

How Google news able to group similar news together

Classification technique or engine generally utilise to segregate the emails (Spam and Non-spam emails) based on known data like

  • Previously marked spam
  • Trigger words and phrases
  • Blacklisted IP address
  • Similar to known spam
  • Different language.

using above parameters on the new incoming email, classification engine determines where to go the email either spam or non-spam.
Written by
Gautam Goswami    

Can be contacted for real time POC developement and handson technical training. Also to develop/support any Hadoop related project. Email:- gautam@onlineguwahati.com. Gautam is a consultant as well as Educator. Prior to that, he worked as Sr. Technical Architect in multiple technologies and business domain. Currently, he is specializing in Big Data processing and analysis, Data lake creation, architecture etc. using HDFS. Besides, involved in HDFS maintenance and loading of multiple types of data from different sources, Design and development of real time use case development on client/customer demands to demonstrate how data can be leveraged for business transformation, profitability etc. He is passionate about sharing knowledge through blogs, seminars, presentations etc. on various Big Data related technologies, methodologies, real time projects with their architecture /design, multiple procedure of huge volume data ingestion, basic data lake creation etc.

Page: 1 2

Recent Posts

RisingWave- An unwrinkled road to event stream processing

A distributed SQL streaming database called RisingWave makes processing streaming data easy, dependable, and efficient.… Read More

6 months ago

Transferring real-time data processed within Apache Flink to Kafka

Transferring real-time data processed within Apache Flink to Kafka and ultimately to Druid for analysis/decision-making.… Read More

7 months ago

Streaming real-time data from Kafka 3.7.0 to Flink 1.18.1 for processing

Over the past few years, Apache Kafka has emerged as the leading standard for streaming… Read More

9 months ago

Why Apache Kafka and Apache Flink work incredibly well together to boost real-time data analytics

When data is analyzed and processed in real-time, it can yield insights and actionable information… Read More

10 months ago

Integrating rate-limiting and backpressure strategies synergistically to handle and alleviate consumer lag in Apache Kafka

Apache Kafka stands as a robust distributed streaming platform. However, like any system, it is… Read More

10 months ago

Leveraging Apache Kafka for the Distribution of Large Messages (in gigabyte size range)

In today's data-driven world, the capability to transport and circulate large amounts of data, especially… Read More

11 months ago