Tech Threads

Transfer structured data from Oracle to Hadoop storage system

Using Apache Sqoop, we can transfer structured data from Relational Database Management System to Hadoop distributed file system (HDFS). Because of distributed storage mechanism in Hadoop Distributed File System (HDFS), we can store any format of data in huge volume in terms of capacity. In RDBMS, data persists in the row and column format (Known as Structured Data). In order to process the huge volume of enterprise data, we can leverage HDFS as a basic data lake.

In the above video, we have explained how sqoop can be used to transfer data (E-Commerce application’s order related data) from Oracle 11g to HDFS. Here single node cluster has been used where Hadoop 2.x has been installed.
Data ingestion mechanism would be same if sqoop used in multi node cluster or built in the cloud environment like Microsoft Azure, Amazon web service, Google Cloud platform etc.

Written by
Gautam Goswami

Can be reached for real-time POC development and hands-on technical training at gautambangalore@gmail.com. Besides, to design, develop just as help in any Hadoop/Big Data handling related task. Gautam is a advisor and furthermore an Educator as well. Before that, he filled in as Sr. Technical Architect in different technologies and business space across numerous nations.
He is energetic about sharing information through blogs, preparing workshops on different Big Data related innovations, systems and related technologies.

 

 

Page: 1 2

Recent Posts

AI on the Fly: Real-Time Data Streaming from Apache Kafka To Live Dashboards

In the current fast-paced digital age, many data sources generate an unending flow of information,… Read More

5 days ago

Real-Time at Sea: Harnessing Data Stream Processing to Power Smarter Maritime Logistics

According to the International Chamber of Shipping, the maritime industry has increased fourfold in the… Read More

3 weeks ago

Driving Streaming Intelligence On-Premises: Real-Time ML with Apache Kafka and Flink

Lately, companies, in their efforts to engage in real-time decision-making by exploiting big data, have… Read More

2 months ago

Dark Data Demystified: The Role of Apache Iceberg

Lurking in the shadows of every organization is a silent giant—dark data. Undiscovered log files,… Read More

3 months ago

The Role of Materialized Views in Modern Data Stream Processing Architectures + RisingWave

Incremental computation in data streaming means updating results as fresh data comes in, without redoing… Read More

6 months ago

Unlocking the Power of Patterns in Event Stream Processing (ESP): The Critical Role of Apache Flink’s FlinkCEP Library

We call this an event when a button is pressed, a sensor detects a temperature… Read More

6 months ago