In the previous article, we have learnt How to Install and Configure Hive with default Derby metastore. However, an embedded derby based metastore can process only one request at a time. Since this is very restrictive, we will setup a traditional relational database (in this case MySQL) to handle multi-client concurrency. In this article we will configure MySQL database as Hive metadata Metastore.

All the metadata for Hive tables and partitions are accessed through the Hive Metastore. In our previous article we have configured metastore database as Local/Embeded. Now we will configure Hive Remote Metastore Database followed by configuration of Metastore Server. First we will start with installation & configuration of MySQL Server in our EdgeNode.

MySQL Installation in EdgeNode

Login to our EdgeNode & start with MySQL Server 5.7 installation.

root@EdgeNode:~# apt-get update
root@EdgeNode:~# apt-get install mysql-server-5.7

During installation the setup will prompt to enter the root user password. In our case let's set the password to e.g. hadoop1234. Finally follow the wizard driven configuration for the below command. Set, Disallow root login remotely to NO & finally set YES to Reload privilege tables.

root@EdgeNode:~# mysql_secure_installation

Next start the MySQL service & validate the successful installation:

root@EdgeNode:~# service mysql start

Now that we have our database up and running, we need to change some values to allow connections from other nodes. We need to configure MySQL such that it can be accessed by it’s Private IP, so that map-reduce programs or scoop can connect to this Database server. Open the file mysqld.cnf and replace the IP address under the [mysqld] tag, set the bind-address to the IP Address of the EdgeNode

root@EdgeNode:~# vi /etc/mysql/mysql.conf.d/mysqld.cnf

# localhost which is more compatible and is not less secure.
bind-address            =

Finally restart the service and validate MySQL version.

root@EdgeNode:~# service mysql restart
root@EdgeNode:~# mysql --version

Next login to MySQL using root account and manage the database as required.

root@EdgeNode:~# mysql -u root -p
mysql> show databases;
mysql> use mysql;
mysql> show tables;
mysql> quit;

Going forward it may be necessary to create few Databases, Users & set Grant Privileges. In our case we will use one database as HIVE metastore, and another one acting as a sample source for SQOOP import etc.

Now let us configure HIVE with MySQL as Metastore. So lets create a database in MySQL say metastore and an user say, hive. Once we have configured our Metastore we will be using Beeline CLI to submit HiveQL, instead of using HIVE CLI, we used in our previous article.

root@EdgeNode:/usr/local/hive/hive_work# mysql -u root -p
mysql> CREATE DATABASE metastore;
mysql> CREATE USER 'hive'@'localhost' IDENTIFIED BY 'hadoop1234';
mysql> GRANT ALL ON metastore.* TO 'hive'@'localhost';
mysql> CREATE USER 'hive'@'%' IDENTIFIED BY 'hadoop1234';
mysql> GRANT ALL ON metastore.* TO 'hive'@'%';
mysql> CREATE USER 'hive'@'' IDENTIFIED BY 'hadoop1234';
mysql> GRANT ALL ON metastore.* TO 'hive'@'';
mysql> quit;

Get the MySQL JDBC driver, so that HIVE can connect to MySQL database. We will download & move the JDBC jar files of MySQL to the Hive library. The JDBC connector is available in the website Download the driver, Untar and move the JDBC jar file to HIVE library.

root@EdgeNode:/usr/local/hive/hive_work# cd /usr/local/hive
root@EdgeNode:/usr/local/hive/hive_work# wget
root@EdgeNode:/usr/local/hive/hive_work# tar -xzf mysql-connector-java-5.0.8.tar.gz >> /dev/null
root@EdgeNode:/usr/local/hive/hive_work# mv mysql-connector-java-5.0.8/mysql-connector-java-5.0.8-bin.jar /usr/local/hive/lib/
root@EdgeNode:/usr/local/hive/hive_work# rm -rf mysql-connector-java-5.0.8
root@EdgeNode:/usr/local/hive/hive_work# rm mysql-connector-java-5.0.8.tar.gz

Next we need to Initialise the Metastore Server and create hive configuration file.

root@EdgeNode:/usr/local/hive/hive_work# cd /usr/local/hive/conf/

HiveServer2 with Remote MySQL Metastore

Once we start using MySQL as the Metastore, we will be able to handle multiple concurrent sessions. But in order to enable multiple clients to connect to Hive database, we need to start on the Hive Server 2 interface. This will enable remote clients to submit queries against this Hive Metastore.

root@EdgeNode:/usr/local/hive/conf# cp hive-default.xml.template hive-site.xml

Now find & replace the following Key/Name with the corresponding Value pairs in the file hive-site.xml as shown below:

root@EdgeNode:/usr/local/hive/conf# vi hive-site.xml










The very next step is to Initialise the MySQL Metastore using the SchemaTool Utility.

root@EdgeNode:/usr/local/hive/conf# cd /usr/local/hive/bin
root@EdgeNode:/usr/local/hive/bin# schematool -dbType mysql -userName hive -passWord hadoop1234 -initSchema

Once the Schema Initialisation completed successfully, we need to start the hive metastore service.

root@EdgeNode:/usr/local/hive/bin# hive --service metastore

Once the Hive Metastore Server is Up & Running, open another shell session to connect to EdgeNode. Next we will connect to Hive using Beeline CLI.

root@EdgeNode:~# cd $HIVE_HOME/bin
root@EdgeNode:/usr/local/hive/bin# cp /usr/local/hive/jdbc/hive-jdbc-2.1.0-standalone.jar /usr/local/hive/lib/
root@EdgeNode:/usr/local/hive/bin# beeline

Beeline version 2.1.0 by Apache Hive
beeline> !connect jdbc:hive2:// hive hadoop1234

Once we are connected to Hive metastore, lets quickly use some HiveQL.

beeline> USE default;
beeline> SHOW TABLES:
beeline> !quit

Now let us start the HiveServer2. Login to EdgeNode on another ssh session and start the server as below:

root@EdgeNode:~# cd $HIVE_HOME/bin
root@EdgeNode:/usr/local/hive/bin# hiveserver2

Once the hiveserver2 is up & running, lets login back to Beeline CLI using our previous shell session for EdgeNode.

root@EdgeNode:/usr/local/hive/bin# beeline
beeline> !connect jdbc:hive2://localhost:10000 hive hadoop1234
beeline> USE default;
beeline> SHOW TABLES:
beeline> !quit

Finally we have configured HIVE in Remote Metastore mode. We will be using Hive a lot at a later stage. Now let us go forward and take a look into data provisioning from RDBMS to HDFS & further into Hive Tables for big data analytics. So for that we will first install Apache SQOOP utility.

Have a question on this subject?

Ask questions to our expert community members and clear your doubts. Asking question or engaging in technical discussion is both easy and rewarding.

Are you on Twitter?

Start following us. This way we will always keep you updated with what's happening in Data Analytics community. We won't spam you. Promise.

  • Hadoop DataLake Implementation Part 3

    To complete our implementation setup we will create the source tables based on the downloaded datafiles. Let us first load the SQL files in MySQL server under a new database called ‘sales’. We will simulate this database schema as our OLTP source...

  • SQOOP import from Oracle

    In this article we will use Apache SQOOP to import data from Oracle database. Now that we have an oracle server in our cluster ready, let us login to EdgeNode. Next we will configure sqoop to import this data in HDFS file system followed by direct...

  • Hadoop DataLake Implementation Part 4

    Now that our dummy OLTP source system & Hadoop HDFS directory structure is ready, we will first load the ‘dates’ data file in HDFS and further to a hive table.

  • Hadoop DataLake Implementation Part 6

    In this article we will load the Customer data in the Hive warehouse as SCD Type 1. This time we will follow a different approach to implement Insert/Update or Merge strategy using Hive QL, rather than SQOOP Merge utility

  • Install FLUME In Client Node of Hadoop Cluster

    Apache Flume is a distributed, robust, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data or streaming event data from different sources to a centralized data store. Its main goal is to...

  • Apache Hadoop Architecture

    In this article we will learn about the Apache Hadoop framework architecture. The basic components of the Apache Hadoop HDFS & MapReduce engine are discussed in brief.

  • SQOOP Merge & Incremental Extraction from Oracle

    Let us check how to perform Incremental Extraction & Merge using Sqoop. The SQOOP Merge utility allows to combine two datasets where entries in one dataset should overwrite entries of an older dataset. For example, an incremental import run in...

  • SQOOP import from MySQL

    In this article we will use Apache SQOOP to import data from MySQL database. For that let us create a MySql database & user and dump some data quickly. Let us download a MySQL database named Sakila Db from internet to get started. Next we will...

  • Install SQOOP in Client Node of Hadoop Cluster

    Sqoop is an open source software product of the Apache Software Foundation in the hadoop ecosystem, designed to transfer data between Hadoop and relational databases or mainframes. Sqoop can be used to import data from a relational database...

  • Introduction to Apache Hadoop

    The Apache Hadoop is next big data platform. Apache Hadoop is an open-source, java-based framework software for reliable, scalable & distributed computing. The Apache Hadoop allows distributed processing of very large data sets across clusters of...