The Apache Hive is a data warehouse software system built on top of the Apache Hadoop. Hive can be utilized for easy data summarization, ad-hoc queries, analysis of large datasets stores in various databases or file systems integrated with Hadoop. Ideally, we use Hive to apply structure (tables) on persisted a large amount of unstructured data in HDFS and subsequently query those data for analysis. Here are few videos listed below those recorded while installing Apache Hive 3.1.2 on multi-node Hadoop 3.2.0 cluster. Also pre operations, those mandatory for successful configuration and running Hive.
Apache Hive still not supporting JDK 11 so we have downgraded our multi-node Hadoop cluster to JDK 8. Earlier cluster was configured with JDK 11 for Hadoop 3.2.0. Tar ball for JDK 8 had downloaded from the Oracle site and copied to individual DataNode including NameNode. Besides, this video would be helpful for other purpose like simple JDK installation on Ubuntu or Linux for learning, java based applications setup.
Here you can see how to install latest version of MySql Database on Ubuntu 1.4.04 LTS. We have installed on a individual DataNode in the cluster which having high hardware configuration. Apache Hive should be connected/integrated with one RDBMS to store its meta information. By default, Apache Hive is packed with Derby Database to utilize as metastore.
Based on RDBMS what we choose for Apache Hive, the respective JDBC driver has to download and configure so that Hive can communicate with the Database and store meta information. Since we installed MySQL as metastore, below video explain how to download and configure.
This video explains how can we create the schema on the MySQL Database by executing the script (.sql file) provided by Apache Hive
Here you can see how to install Apache Hive-3.1.2 on a high hardware configured DataNode in a multi-node cluster where Hadoop-3.2.0 already running.
Please click here to read the complete Apache Hive on multi-node cluster.
Source:- The Apache Software Foundation