Configuring MySQL as Hive Metastore
In the previous article, we have learnt How to Install and Configure Hive with default Derby metastore. However, an embedded derby based metastore can process only one request at a time. Since this is very restrictive, we will setup a traditional relational database (in this case MySQL) to handle multi-client concurrency. In this article we will configure MySQL database as Hive metadata Metastore.
All the metadata for Hive tables and partitions are accessed through the Hive Metastore. In our previous article we have configured metastore database as Local/Embeded. Now we will configure Hive Remote Metastore Database followed by configuration of Metastore Server. First we will start with installation & configuration of MySQL Server in our EdgeNode.
MySQL Installation in EdgeNode
Login to our EdgeNode & start with MySQL Server 5.7 installation.
root@EdgeNode:~# apt-get update
root@EdgeNode:~# apt-get install mysql-server-5.7
During installation the setup will prompt to enter the root user password. In our case let's set the password to e.g. hadoop1234. Finally follow the wizard driven configuration for the below command. Set, Disallow root login remotely to NO & finally set YES to Reload privilege tables.
root@EdgeNode:~# mysql_secure_installation
Next start the MySQL service & validate the successful installation:
root@EdgeNode:~# service mysql start
Now that we have our database up and running, we need to change some values to allow connections from other nodes. We need to configure MySQL such that it can be accessed by it’s Private IP, so that map-reduce programs or scoop can connect to this Database server. Open the file mysqld.cnf and replace the IP address under the [mysqld] tag, set the bind-address to the IP Address of the EdgeNode 10.0.100.3
root@EdgeNode:~# vi /etc/mysql/mysql.conf.d/mysqld.cnf
[mysqld]
..
...
....
# localhost which is more compatible and is not less secure.
bind-address = 10.0.100.3
Finally restart the service and validate MySQL version.
root@EdgeNode:~# service mysql restart
root@EdgeNode:~# mysql --version
Next login to MySQL using root account and manage the database as required.
root@EdgeNode:~# mysql -u root -p
mysql> show databases;
mysql> use mysql;
mysql> show tables;
mysql> quit;
Going forward it may be necessary to create few Databases, Users & set Grant Privileges. In our case we will use one database as HIVE metastore, and another one acting as a sample source for SQOOP import etc.
Now let us configure HIVE with MySQL as Metastore. So lets create a database in MySQL say metastore and an user say, hive. Once we have configured our Metastore we will be using Beeline CLI to submit HiveQL, instead of using HIVE CLI, we used in our previous article.
root@EdgeNode:/usr/local/hive/hive_work# mysql -u root -p
mysql> CREATE DATABASE metastore;
mysql> CREATE USER 'hive'@'localhost' IDENTIFIED BY 'hadoop1234';
mysql> GRANT ALL ON metastore.* TO 'hive'@'localhost';
mysql> CREATE USER 'hive'@'%' IDENTIFIED BY 'hadoop1234';
mysql> GRANT ALL ON metastore.* TO 'hive'@'%';
mysql> CREATE USER 'hive'@'10.0.100.3' IDENTIFIED BY 'hadoop1234';
mysql> GRANT ALL ON metastore.* TO 'hive'@'10.0.100.3';
mysql> FLUSH PRIVILEGES;
mysql> quit;
Get the MySQL JDBC driver, so that HIVE can connect to MySQL database. We will download & move the JDBC jar files of MySQL to the Hive library.
The JDBC connector is available in the website http://cdn.mysql.com//Downloads/Connector-J/mysql-connector-java-5.0.8.tar.gz. Download the driver, Untar and move the JDBC jar file to HIVE library.
root@EdgeNode:/usr/local/hive/hive_work# cd /usr/local/hive
root@EdgeNode:/usr/local/hive/hive_work# wget http://cdn.mysql.com//Downloads/Connector-J/mysql-connector-java-5.0.8.tar.gz
root@EdgeNode:/usr/local/hive/hive_work# tar -xzf mysql-connector-java-5.0.8.tar.gz >> /dev/null
root@EdgeNode:/usr/local/hive/hive_work# mv mysql-connector-java-5.0.8/mysql-connector-java-5.0.8-bin.jar /usr/local/hive/lib/
root@EdgeNode:/usr/local/hive/hive_work# rm -rf mysql-connector-java-5.0.8
root@EdgeNode:/usr/local/hive/hive_work# rm mysql-connector-java-5.0.8.tar.gz
Next we need to Initialise the Metastore Server and create hive configuration file.
root@EdgeNode:/usr/local/hive/hive_work# cd /usr/local/hive/conf/
HiveServer2 with Remote MySQL Metastore
Once we start using MySQL as the Metastore, we will be able to handle multiple concurrent sessions. But in order to enable multiple clients to connect to Hive database, we need to start on the Hive Server 2 interface. This will enable remote clients to submit queries against this Hive Metastore.
root@EdgeNode:/usr/local/hive/conf# cp hive-default.xml.template hive-site.xml
Now find & replace the following Key/Name with the corresponding Value pairs in the file hive-site.xml as shown below:
root@EdgeNode:/usr/local/hive/conf# vi hive-site.xml
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://10.0.100.3:3306/metastore?createDatabaseIfNotExist=true</value>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hadoop1234</value>
<name>hive.metastore.uris</name>
<value></value>thrift://localhost:9083</value>
<name>hive.exec.local.scratchdir</name>
<value>/tmp/hive</value>
<name>hive.downloaded.resources.dir</name>
<value>/tmp/hive/${hive.session.id}_resources</value>
<name>hive.server2.enable.doAs</name>
<value>false</value>
<name>hive.scratch.dir.permission</name>
<value>777</value>
The very next step is to Initialise the MySQL Metastore using the SchemaTool Utility.
root@EdgeNode:/usr/local/hive/conf# cd /usr/local/hive/bin
root@EdgeNode:/usr/local/hive/bin# schematool -dbType mysql -userName hive -passWord hadoop1234 -initSchema
Once the Schema Initialisation completed successfully, we need to start the hive metastore service.
root@EdgeNode:/usr/local/hive/bin# hive --service metastore
Once the Hive Metastore Server is Up & Running, open another shell session to connect to EdgeNode. Next we will connect to Hive using Beeline CLI.
root@EdgeNode:~# cd $HIVE_HOME/bin
root@EdgeNode:/usr/local/hive/bin# cp /usr/local/hive/jdbc/hive-jdbc-2.1.0-standalone.jar /usr/local/hive/lib/
root@EdgeNode:/usr/local/hive/bin# beeline
Beeline version 2.1.0 by Apache Hive
beeline> !connect jdbc:hive2:// hive hadoop1234
Once we are connected to Hive metastore, let's quickly use some HiveQL.
beeline> SHOW DATABASES:
beeline> USE default;
beeline> SHOW TABLES:
beeline> !quit
Now let us start the HiveServer2. Login to EdgeNode on another ssh session and start the server as below:
root@EdgeNode:~# cd $HIVE_HOME/bin
root@EdgeNode:/usr/local/hive/bin# hiveserver2
Once the hiveserver2 is up & running, let's login back to Beeline CLI using our previous shell session for EdgeNode.
root@EdgeNode:/usr/local/hive/bin# beeline
beeline> !connect jdbc:hive2://localhost:10000 hive hadoop1234
beeline> SHOW DATABASES:
beeline> USE default;
beeline> SHOW TABLES:
beeline> !quit
Finally we have configured HIVE in Remote Metastore mode. We will be using Hive a lot at a later stage. Now let us go forward and take a look into data provisioning from RDBMS to HDFS & further into Hive Tables for big data analytics. So for that we will first install Apache SQOOP utility.