An overview of how to start Apache Hive installation on Ubuntu and ways to configure it with environmental parameters.
Apache Hive is a data warehouse infrastructure that facilitates querying and managing large data sets which reside in distributed storage systems. It is developed on top of Hadoop. Hive has its own SQL-like query language called HiveQL (Hive Query Language).
Since Hive is built on top of Hadoop, Java and Hadoop needs to be installed in your system.
Before installing the Hive, make sure your Hadoop installation is fine and all the core services of Hadoop are up and running.
Big Data Ingestion: Why it matters for your business?
I hope we all agree that our future will be highly data-driven. In today's connected and digitally transformed the…
The environment used in this setup is Ubuntu 18.04 and Hive version is 3.1.2.
It is to be noted that Java 8 is preferred more for this process, as latest Java versions no longer has URLClassLoader that is needed to keep the Hive running.
Steps to start Apache Hive 3.x installation on Ubuntu
So to start the installation process. Firstly, we need to download the up-to-date stable release from (http://apachemirror.wuchna.com/hive/hive-3.1.2/), for older versions visit (http://apachemirror.wuchna.com/hive/).
Secondly, preferred release is downloaded using the commands. (Both the directory locations and the download link can be changed as per your choice).
cd /usr/localsudo wget http://apachemirror.wuchna.com/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz
Thirdly, the hive file needs to extracted in the same location.
sudo tar xvzf apache-hive-3.1.2-bin.tar.gz
Fourthly, the extracted folder is then renamed
sudo mv apache-hive-3.1.2-bin hive
Next step is the addition of Hive Environment Variables
Here, the hive path needs to added to the environment, or else you would need to run commands using the hive directory.
Then open the bashrc file by running.
sudo nano ~/.bashrc
Next the below commands needs to be added at the end of the bashrc file.
# Set HIVE_HOMEexport HIVE_HOME=/usr/local/hiveexport PATH=$PATH:$HIVE_HOME/bin
Next, load the hive environment variables by using the below commands.
Creating directories …
Here, we need to create Hive directories within HDFS.
hdfs dfs -mkdir /bigdata/tmp
As for the hive to save table and other miscellaneous data we may need to create another directory.
hdfs dfs -mkdir -p /bigdata/hive/warehouse
Next the permissions are added using the below commands…
hdfs dfs -chmod g+w /bigdata/tmphdfs dfs -chmod g+w /bigdata/hive/warehouse
Steps to configuring the Hive …
Convert working directory to hive configuration locations using the below commands.
Then open the hive-env file by using the below command.
sudo nano hive-env.sh
Next we need to add the following configurations to the end of the file (path changes could be made according to your setup).
# Set HADOOP_HOME to point to a specific hadoop install directoryexport HADOOP_HOME=/usr/local/hadoop# Hive Configuration Directory can be controlled by:export HIVE_CONF_DIR=/usr/local/hive/conf# Java Homeexport JAVA_HOME=/usr
Metastore is the primary central repository of the Apache Hive metadata. It is programmed to store metadata for Hive tables (just like their schema and directory location) and then partitions are made in a relational database.
Each one of the Hive implementations needs a metastore service, so as to store the metadata. By default, Hive uses a in-built Derby SQL server. Moreover, you can also choose MySQL, Postgres, Oracle, MS SQL Server as Hive Metastore.
Accordingly, we would use MySQL for this configuration to work. More importantly, the Metastore configuration needs to be specified in the hive-site.xml file.
Here, let us start the installation of the latest mysql version using aptitude. You may aslo skip this step if the system already has a mysql installation file.
sudo apt-get updatesudo apt-get install mysql-server
Also if the secure installation utility does not launch automatically as soon as the installation completes, use following command.
sudo mysql_secure_installation utility
The utility then instructs you to define the mysql root password and other security-related choices, involving removing remote access to the root user and setting the root password.
sudo systemctl start mysql
This command begins with the mysql service.
sudo systemctl enable mysql
Also, this command ensures that the database server reliably launches after a system reboot.
Now after proper installation of mysql server, we need to install the mysql java connector. By running the following command to install the connector.
sudo apt-get install libmysql-java
So, for the hive to access the mysql connector, a soft link needs to be created for the connector in hive lib folder or the jar file must be copied to the hive lib folder.
ln -s /usr/share/java/mysql-connector-java.jar $HIVE_HOME/lib/mysql-connector-java.jar
Here develop the Initial database schema using the hive-schema-3.1.0.mysql.sql file ( or the file that corresponds to your installed version of Hive) situated in the $HIVE_HOME/scripts/metastore/upgrade/mysql directory.
Then login to mysql shell.
mysql -u root -p
Create database for the metastore using the below commands.
CREATE DATABASE metastore;USE metastore;SOURCE /usr/local/hive/scripts/metastore/upgrade/mysql/hive-schema-3.1.0.mysql.sql;
As for Hive to access the metastore a MySQL different user account needs to be created. It is also very important to prevent this user account from creating or altering tables in the metastore database schema.
CREATE USER ‘hiveuser’@’%’ IDENTIFIED BY ‘hivepassword’;GRANT all on *.* to ‘hiveuser’@localhost identified by ‘hivepassword’;flush privileges;
As now we’ve created the appropriate metastore and hive user in mysql. Lets define the metastore configuration in hive-site.xml.
The Open the hive-site file by entering the following command.
cd /usr/local/hive/confsudo nano hive-site.xml
Also we need to add the following configurations.
<configuration><property><name>javax.jdo.option.ConnectionURL</name><value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true</value><description>metadata is stored in a MySQL server</description></property><property><name>javax.jdo.option.ConnectionDriverName</name><value>com.mysql.jdbc.Driver</value><description>MySQL JDBC driver class</description></property><property><name>javax.jdo.option.ConnectionUserName</name><value>hiveuser</value><description>user name for connecting to mysql server</description></property><property><name>javax.jdo.option.ConnectionPassword</name><value>password</value><description>hivepassword for connecting to mysql server</description></property></configuration>
Now lets try out hive console. For this type in the following commands in your terminal and click enter
Also if you experience any errors related to jdbc driver not found, check if you have successfully linked or copied the driver to hive lib folder.
Also if you are able to get into the hive console without any errors, you may then verify your metastore configuration by following the below steps.
Create a table in hive.
create table test(id int, name string);
Then exit from the hive console (Type exit and hit enter)
Let’s see if the table was successfully added in our metastore. Login to the mysql console using hive user credentials.
mysql -u root -p
Change the database and view tables.
use metastore;select * from TBLS;
If you can see the test table listed in the TBLS table, then your installation was successful. If not, thoroughly check your mysql database and user configuration for the hive metastore.
For a much better guide on how to install Apache Hive on Ubuntu, this article would be very useful: