In the previous article, we have learnt How to Install and Configure Hive with default Derby metastore. However, an embedded derby based metastore can process only one request at a time. Since this is very restrictive, we will setup a traditional relational database (in this case MySQL) to handle multi-client concurrency. In this article we will configure MySQL database as Hive metadata Metastore.

All the metadata for Hive tables and partitions are accessed through the Hive Metastore. In our previous article we have configured metastore database as Local/Embeded. Now we will configure Hive Remote Metastore Database followed by configuration of Metastore Server. First we will start with installation & configuration of MySQL Server in our EdgeNode.

MySQL Installation in EdgeNode

Login to our EdgeNode & start with MySQL Server 5.7 installation.

root@EdgeNode:~# apt-get update
root@EdgeNode:~# apt-get install mysql-server-5.7

During installation the setup will prompt to enter the root user password. In our case let's set the password to e.g. hadoop1234. Finally follow the wizard driven configuration for the below command. Set, Disallow root login remotely to NO & finally set YES to Reload privilege tables.

root@EdgeNode:~# mysql_secure_installation

Next start the MySQL service & validate the successful installation:

root@EdgeNode:~# service mysql start

Now that we have our database up and running, we need to change some values to allow connections from other nodes. We need to configure MySQL such that it can be accessed by it’s Private IP, so that map-reduce programs or scoop can connect to this Database server. Open the file mysqld.cnf and replace the IP address under the [mysqld] tag, set the bind-address to the IP Address of the EdgeNode 10.0.100.3

root@EdgeNode:~# vi /etc/mysql/mysql.conf.d/mysqld.cnf

[mysqld]
..
...
....
# localhost which is more compatible and is not less secure.
bind-address            = 10.0.100.3

Finally restart the service and validate MySQL version.

root@EdgeNode:~# service mysql restart
root@EdgeNode:~# mysql --version

Next login to MySQL using root account and manage the database as required.

root@EdgeNode:~# mysql -u root -p
mysql> show databases;
mysql> use mysql;
mysql> show tables;
mysql> quit;

Going forward it may be necessary to create few Databases, Users & set Grant Privileges. In our case we will use one database as HIVE metastore, and another one acting as a sample source for SQOOP import etc.

Now let us configure HIVE with MySQL as Metastore. So lets create a database in MySQL say metastore and an user say, hive. Once we have configured our Metastore we will be using Beeline CLI to submit HiveQL, instead of using HIVE CLI, we used in our previous article.

root@EdgeNode:/usr/local/hive/hive_work# mysql -u root -p
mysql> CREATE DATABASE metastore;
mysql> CREATE USER 'hive'@'localhost' IDENTIFIED BY 'hadoop1234';
mysql> GRANT ALL ON metastore.* TO 'hive'@'localhost';
mysql> CREATE USER 'hive'@'%' IDENTIFIED BY 'hadoop1234';
mysql> GRANT ALL ON metastore.* TO 'hive'@'%';
mysql> CREATE USER 'hive'@'10.0.100.3' IDENTIFIED BY 'hadoop1234';
mysql> GRANT ALL ON metastore.* TO 'hive'@'10.0.100.3';
mysql> FLUSH PRIVILEGES;
mysql> quit;

Get the MySQL JDBC driver, so that HIVE can connect to MySQL database. We will download & move the JDBC jar files of MySQL to the Hive library. The JDBC connector is available in the website http://cdn.mysql.com//Downloads/Connector-J/mysql-connector-java-5.0.8.tar.gz. Download the driver, Untar and move the JDBC jar file to HIVE library.

root@EdgeNode:/usr/local/hive/hive_work# cd /usr/local/hive
root@EdgeNode:/usr/local/hive/hive_work# wget http://cdn.mysql.com//Downloads/Connector-J/mysql-connector-java-5.0.8.tar.gz
root@EdgeNode:/usr/local/hive/hive_work# tar -xzf mysql-connector-java-5.0.8.tar.gz >> /dev/null
root@EdgeNode:/usr/local/hive/hive_work# mv mysql-connector-java-5.0.8/mysql-connector-java-5.0.8-bin.jar /usr/local/hive/lib/
root@EdgeNode:/usr/local/hive/hive_work# rm -rf mysql-connector-java-5.0.8
root@EdgeNode:/usr/local/hive/hive_work# rm mysql-connector-java-5.0.8.tar.gz

Next we need to Initialise the Metastore Server and create hive configuration file.

root@EdgeNode:/usr/local/hive/hive_work# cd /usr/local/hive/conf/

HiveServer2 with Remote MySQL Metastore

Once we start using MySQL as the Metastore, we will be able to handle multiple concurrent sessions. But in order to enable multiple clients to connect to Hive database, we need to start on the Hive Server 2 interface. This will enable remote clients to submit queries against this Hive Metastore.

root@EdgeNode:/usr/local/hive/conf# cp hive-default.xml.template hive-site.xml

Now find & replace the following Key/Name with the corresponding Value pairs in the file hive-site.xml as shown below:

root@EdgeNode:/usr/local/hive/conf# vi hive-site.xml
hive-site.xml


<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://10.0.100.3:3306/metastore?createDatabaseIfNotExist=true</value>


<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>


<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>

<name>javax.jdo.option.ConnectionPassword</name>
<value>hadoop1234</value>

<name>hive.metastore.uris</name>
<value>thrift://localhost:9083</value>

<name>hive.exec.local.scratchdir</name>
<value>/tmp/hive</value>

<name>hive.downloaded.resources.dir</name>
<value>/tmp/hive/${hive.session.id}_resources</value>

<name>hive.server2.enable.doAs</name>
<value>false</value>

<name>hive.scratch.dir.permission</name>
<value>777</value>

The very next step is to Initialise the MySQL Metastore using the SchemaTool Utility.

root@EdgeNode:/usr/local/hive/conf# cd /usr/local/hive/bin
root@EdgeNode:/usr/local/hive/bin# schematool -dbType mysql -userName hive -passWord hadoop1234 -initSchema

Once the Schema Initialisation completed successfully, we need to start the hive metastore service.

root@EdgeNode:/usr/local/hive/bin# hive --service metastore

Once the Hive Metastore Server is Up & Running, open another shell session to connect to EdgeNode. Next we will connect to Hive using Beeline CLI.

root@EdgeNode:~# cd $HIVE_HOME/bin
root@EdgeNode:/usr/local/hive/bin# cp /usr/local/hive/jdbc/hive-jdbc-2.1.0-standalone.jar /usr/local/hive/lib/
root@EdgeNode:/usr/local/hive/bin# beeline

Beeline version 2.1.0 by Apache Hive
beeline> !connect jdbc:hive2:// hive hadoop1234

Once we are connected to Hive metastore, lets quickly use some HiveQL.

beeline> SHOW DATABASES:
beeline> USE default;
beeline> SHOW TABLES:
beeline> !quit

Now let us start the HiveServer2. Login to EdgeNode on another ssh session and start the server as below:

root@EdgeNode:~# cd $HIVE_HOME/bin
root@EdgeNode:/usr/local/hive/bin# hiveserver2

Once the hiveserver2 is up & running, lets login back to Beeline CLI using our previous shell session for EdgeNode.

root@EdgeNode:/usr/local/hive/bin# beeline
beeline> !connect jdbc:hive2://localhost:10000 hive hadoop1234
beeline> SHOW DATABASES:
beeline> USE default;
beeline> SHOW TABLES:
beeline> !quit

Finally we have configured HIVE in Remote Metastore mode. We will be using Hive a lot at a later stage. Now let us go forward and take a look into data provisioning from RDBMS to HDFS & further into Hive Tables for big data analytics. So for that we will first install Apache SQOOP utility.


Have a question on this subject?

Ask questions to our expert community members and clear your doubts. Asking question or engaging in technical discussion is both easy and rewarding.

Are you on Twitter?

Start following us. This way we will always keep you updated with what's happening in Data Analytics community. We won't spam you. Promise.

  • Introduction to Apache Hadoop

    The Apache Hadoop is next big data platform. Apache Hadoop is an open-source, java-based framework software for reliable, scalable & distributed computing. The Apache Hadoop allows distributed processing of very large data sets across clusters of...

  • Understanding Map-Reduce with Examples

    In my previous article – “Fools guide to Big Data” – we have discussed about the origin of Bigdata and the need of big data analytics. We have also noted that Big Data is data that is too large, complex and dynamic for any conventional data tools...

  • Configuring MySQL as Hive Metastore

    In the previous article, we have learnt How to Install and Configure Hive with default Derby metastore. However, an embedded derby based metastore can process only one request at a time. Since this is very restrictive, we will setup a traditional...

  • Stream Webserver Log into Hdfs using FLUME

    In this article we will use Apache Flume to gather stream access log data from our remote Web Server into Hadoop Distributed File System. We will be analyzing the access log in a real-time basis. So we have to setup Flume such that it collects the...

  • Hadoop DataLake Implementation

    In this multi-series article we will learn how to implement an Enterprise DataLake using Apache Hadoop, an open-source, java-based software framework for reliable, scalable & distributed computing. Apache Hadoop addresses the limitations of...

  • SQOOP import from Oracle

    In this article we will use Apache SQOOP to import data from Oracle database. Now that we have an oracle server in our cluster ready, let us login to EdgeNode. Next we will configure sqoop to import this data in HDFS file system followed by direct...

  • Hadoop DataLake Implementation Part 5

    In this article we will load the showroom master data from MySQL source system to HDFS using Sqoop as SCD Type 1.

  • Install PIG In Client Node of Hadoop Cluster

    Apache Pig is a platform for analyzing large data sets. Pig Latin is the high level programming language that, lets us specify a sequence of data transformations such as merging data sets, filtering them, grouping them, and applying functions to...

  • Hadoop DataLake Implementation Part 10

    In this article we will create oozie workflow to orchestrate the daily loading of showroom dimension table from MySQL source to HDFS using Sqoop, followed by Loading data from HDFS to Hive warehouse using Hive and finally housekkeping & archive.

  • SQOOP import from MySQL

    In this article we will use Apache SQOOP to import data from MySQL database. For that let us create a MySql database & user and dump some data quickly. Let us download a MySQL database named Sakila Db from internet to get started. Next we will...