The aim of this article is to provide an outline for creating network topology for Hadoop installation in multi node hybrid cluster with limited available hardware resources. This cluster would be beneficial for learning Hadoop, lower volume of unstructured data processing using various engines etc. Before the cluster setup, we installed Hadoop on a single node cluster running on Ubuntu 14.04 on top of Windows 10 using VMware workstation player. Later we have copied the .vmx file into multiple systems which are identified to run Hadoop’s Data node on VMware as per the hardware resource availability. This helps to avoid repeated installation and time. Configuration of data nodes with name node will be explained in a separate post/article. Here is the list of hardware/system details used to create this network topology.
1. D-Link DES-1005C 10/100 Network Switch
2. Dell Inspiron 5458 Laptop with 16 GB RAM and Windows 10 as host operating system
3. Dell Inspiron 1525 Laptop with 2 GB RAM and Ubuntu 14.04 as operating system
4. Lenovo B40-80 Laptop with 4 GB RAM and Windows 10 as host operating system
5. Desktop with 8 GB RAM and Windows 7 Professional as host operating system
6. Straight-through cables
Practicalities prior to creating the cluster
VMware Workstation 12 player has installed in system no 2 and 3 respectively and eventually installed OS Ubuntu 14.04 on top of it as guest OS. Similarly VMware Workstation 7.x player has installed on system no 5 and subsequently Ubuntu 14.04 on top of it as guest OS. We disabled Windows, Anti-Virus firewall on all the systems. Enabling a firewall denies all data packets to entering and exiting the network restricted and eventually prevents systems to communicate with each other in the cluster. We scanned each system individually prior to networking using strong Anti-virus software to eliminate malware, virus etc. Choosing a network switch over router to create the cluster helped us to avoid internet connectivity. Otherwise, there could be a potential threat of malware/virus infection since we already disabled firewall. Typically, for connecting different types of devices, straight-through cable can be used like PC to Switch, PC to Router, Router to Switch. So we have used straight-through cable .There is a thumb rule that we need to follow in order to make intercommunication happen/LAN among the systems and VM running on different systems.
Step 1:- Ethernet LAN setup using Network Switch
Step 2:- Network Adapter setting on the VM Player
Step 3:- Configure Internet Protocol Version 4 (TCP/IPv4) on Windows
Step 4:- Assign static IP to Ubuntu 14.04 as well as on top of VMware workstation players
Execution Steps in sequence
Step -1:- Ethernet LAN setup using Network Switch
Most modern computers have built-in Ethernet adapters with the port located on the back or side of the machine. Just need to locate the Ethernet port on individual system. Connect an Ethernet cable (Cat-5e) between individual system and the network switch. Locate an open Ethernet port on the network switch and repeat this process to complete the connection among all the systems with network switch.
Can be contacted for real time POC development and hands-on technical training. Also to develop/support any Hadoop related project. Email:- gautam@onlineguwahati.com. Gautam is a consultant as well as Educator. Prior to that, he worked as Sr. Technical Architect in multiple technologies and business domain. Currently, he is specializing in Big Data processing and analysis, Data lake creation, architecture etc. using HDFS. Besides, involved in HDFS maintenance and loading of multiple types of data from different sources, Design and development of real time use case development on client/customer demands to demonstrate how data can be leveraged for business transformation, profitability etc. He is passionate about sharing knowledge through blogs, seminars, presentations etc. on various Big Data related technologies, methodologies, real time projects with their architecture /design, multiple procedure of huge volume data ingestion, basic data lake creation etc.
Page: 1 2
Data is being generated from various sources, including electronic devices, machines, and social media, across… Read More
An Apache Kafka outage occurs when a Kafka cluster or some of its components fail,… Read More
Complex event processing (CEP) is a highly effective and optimized mechanism that combines several sources… Read More
Source:- www.PacktPub.com This book focuses on data science, a rapidly expanding field of study and… Read More
Over the past few years, Apache Kafka has emerged as the top event streaming platform… Read More
In the current fast-paced digital age, many data sources generate an unending flow of information,… Read More