Install HBASE in Hadoop Cluster
Apache HBase provides large-scale tabular storage for Hadoop using the Hadoop Distributed File System (HDFS). Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google's Bigtable. HBase is used in cases where we require random, realtime read/write access to Big Data. We can host very large tables (billions of rows X millions of columns) atop clusters of commodity hardware using HBase. In this article we will Install HBase in a fully distributed hadoop cluster.
HBase scales by splitting all rows into regions. Each region is hosted by exactly one server. Writes are held(sorted) in memory until flush. Reads merge rows in memory with flushed files. Reads & writes to a single row are consistent. A row is an atomic byte array of Key-Value map container with one row key. The row is atomic and gets flushed to disk periodically. But it doesn't have to be flushed into just a single file. It can be broken up in different store files with different properties, and reads can look at just a subset.This advanced design option is called Column Families. Column Family: divide columns into physical files. HBase has neither joins nor indexes as like any distributed DB.
Hbase Installation
We will configure our cluster to host the HBase Master Server in our NameNode & Region Servers in our DataNodes. Also Apache Zookeeper is a pre-requisite for HBase installation. In this case we will configure HBase, to manage its own instance of Zookeeper. We will configure the NameNode to host the Zookeeper Quorum. So let ssh & login to our NameNode.
Master Server Setup
Get the Latest Stable Release of HBase Package from the site:
http://www-us.apache.org/dist/hbase/stable
In the time of writing this article, HBase 1.2.3 is the latest stable version. We will install HBase under /usr/local/ directory.
root@NameNode:~# cd /usr/local
root@NameNode:/usr/local/# wget http://www-us.apache.org/dist/hbase/stable/hbase-1.2.3-bin.tar.gz
root@NameNode:/usr/local/# tar -xzvf hbase-1.2.3-bin.tar.gz >> /dev/null
root@NameNode:/usr/local/# mv hbase-1.2.3 /usr/local/hbase
root@NameNode:/usr/local/# rm hbase-1.2.3-bin.tar.gz
Set the HBase environment variables in .bashrc file. Append below lines to the file and source the environment file.
root@NameNode:/usr/local# vi ~/.bashrc
export HBASE_HOME=/usr/local/hbase
export PATH=$PATH:$HBASE_HOME/bin
export CLASSPATH=$CLASSPATH:/usr/local/hbase/lib/*:.
root@NameNode:/usr/local# source ~/.bashrc
Next we need to configure HBase environment script and set the Java Home. Also we will configure HBase to manage it's Zookeper Instance.
Open the hbase-env.sh file and append the lines to the file.
root@NameNode:/usr/local/hbase/conf# vi hbase-env.sh
export JAVA_HOME=/usr/lib/jvm/java-7-oracle/jre
export HBASE_MANAGES_ZK=true
Next we will configure the site specific properties of HBase in the file, hbase-site.xml.
vi hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://NameNode:8020/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/usr/local/zookeeper</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>NameNode</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
</configuration>
Next we have to list down DataNodes which will host the Region Servers, in the file regionservers.
root@NameNode:/usr/local/hbase/conf# vi regionservers
DataNode1
DataNode2
Additionally we will create a local directory for Zookeper to maintain it's log file.
root@NameNode:/usr/local/hbase/conf# mkdir -p /usr/local/zookeeper
Region Server Setup
Now we have to configure our DataNodes to act as Region Servers. In our case we have two DataNodes. We will secure copy the hbase directory with the binaries and configuration files from the NameNode to the DataNodes.
root@NameNode:/usr/local/hbase/conf# cd /usr/local
root@NameNode:/usr/local# scp -r hbase DataNode1:/usr/local
root@NameNode:/usr/local# scp -r hbase DataNode2:/usr/local
Next we need to update the Environment configuration of HBase in all the DataNodes. Append the below two lines in the .bashrc files in both the DataNodes.
root@NameNode:/usr/local# ssh root@DataNode1
root@DataNode1:~# vi ~/.bashrc
export HBASE_HOME=/usr/local/hbase
export PATH=$PATH:$HBASE_HOME/bin
export CLASSPATH=$CLASSPATH:/usr/local/hbase/lib/*:.
root@DataNode1:~# source ~/.bashrc
root@DataNode1:~# exit
Repeat the above step for all the other DataNodes.
Well we are done with the installation & configuration. So it's time to start the HBase services.
root@NameNode:/usr/local# $HBASE_HOME/bin/start-hbase.sh
Let us validate the services running in NameNode as well as in the DataNodes.
root@NameNode:/usr/local# jps
5721 NameNode
5943 SecondaryNameNode
6103 ResourceManager
6217 JobHistoryServer
6752 HQuorumPeer
6813 HMaster
7031 Jps
root@NameNode:/usr/local# ssh root@DataNode1
root@DataNode1:~# jps
3869 DataNode
4004 NodeManager
4196 HRegionServer
4444 Jps
root@DataNode1:~# exit
Quickly validate the installation.
root@NameNode:/usr/local# hbase version
HBase 1.2.3
Source code repository git://kalashnikov.att.net/Users/stack/checkouts/hbase.git.commit revision=bd63744624a26dc3350137b564fe746df7a721a4
Compiled by stack on Mon Aug 29 15:13:42 PDT 2016
From source with checksum 0ca49367ef6c3a680888bbc4f1485d18
Now let us start HBase shell and check some commands.
root@NameNode:~# $HBASE_HOME/bin/hbase shell
hbase(main):001:0> status
1 active master, 0 backup masters, 2 servers, 0 dead, 1.0000 average load
hbase(main):002:0> list
TABLE
0 row(s) in 0.0610 seconds
=> []
hbase(main):003:0> exit
Configure EdgeNode to access HBase
Let us configure of EdgeNode or Client Node to access HBase. Going forward we will see Hive & HBase interaction. Logon to the EdgeNode & secure copy the HBase directory from the NameNode.
root@EdgeNode:~# cd /usr/local
root@EdgeNode:/usr/local# scp -r root@NameNode:/usr/local/hbase /usr/local/
After that we will set the environment variables accordingly.
root@EdgeNode:/usr/local# vi ~/.bashrc
export HBASE_HOME=/usr/local/hbase
export PATH=$PATH:$HBASE_HOME/bin
export CLASSPATH=$CLASSPATH:/usr/local/hbase/lib/*:.
root@EdgeNode:/usr/local# source ~/.bashrc
Next login to HBase shell. We will create table with family columns, put some data in the table scan and get the data etc.
root@NameNode:~# $HBASE_HOME/bin/hbase shell
hbase(main):001:0> create 'cdr', 'index', 'customer', 'type', 'timing', 'usage', 'correspondent', 'network'
0 row(s) in 1.8530 seconds
=> Hbase::Table - cdr
hbase(main):002:0> put 'cdr', '010', 'index:customerindex', '0'
0 row(s) in 0.3470 seconds
hbase(main):003:0> put 'cdr', '010', 'index:customercount', '1'
0 row(s) in 0.0110 seconds
hbase(main):004:0> put 'cdr', '010', 'index:patterncdrindex', '0'
0 row(s) in 0.0210 seconds
hbase(main):005:0> put 'cdr', '010', 'index:customercdrcount', '10'
0 row(s) in 0.0190 seconds
hbase(main):006:0> put 'cdr', '010', 'index:customerpatternduration', '900'
0 row(s) in 0.0200 seconds
hbase(main):007:0> put 'cdr', '010', 'index:customerprofileduration', '900'
0 row(s) in 0.0090 seconds
hbase(main):008:0> put 'cdr', '010', 'index:profilemarker', 'Profile #1'
0 row(s) in 0.0070 seconds
hbase(main):009:0> put 'cdr', '010', 'index:patternmarker', 'Pattern #1 - 10 outgoing voice calls of 1-30 and toward the same corresp.'
0 row(s) in 1.2350 seconds
hbase(main):010:0> put 'cdr', '010', 'customer:cust_imsi', '208100000000000'
0 row(s) in 0.0210 seconds
hbase(main):011:0> put 'cdr', '010', 'customer:cust_isdn', '0600000000'
0 row(s) in 0.0110 seconds
hbase(main):012:0> put 'cdr', '010', 'customer:cust_imei', '350000000000000'
0 row(s) in 0.0240 seconds
hbase(main):013:0> put 'cdr', '010', 'customer:custoperator', 'FRAF2'
0 row(s) in 0.0160 seconds
hbase(main):014:0> put 'cdr', '010', 'type:calltype', 'MOC'
0 row(s) in 0.0340 seconds
hbase(main):015:0> put 'cdr', '010', 'type:callservice', 'Voice'
0 row(s) in 0.0090 seconds
hbase(main):016:0> list 'cdr'
TABLE
cdr
1 row(s) in 0.2020 seconds
hbase(main):017:0> scan 'cdr'
ROW COLUMN+CELL
010 column=customer:cust_imei, timestamp=1473821545112, value=350000000000000
010 column=customer:cust_imsi, timestamp=1473821544911, value=208100000000000
010 column=customer:cust_isdn, timestamp=1473821544985, value=0600000000
010 column=customer:custoperator, timestamp=1473821546676, value=FRAF2
010 column=index:customercdrcount, timestamp=1473821256779, value=10
010 column=index:customercount, timestamp=1473821252725, value=1
010 column=index:customerindex, timestamp=1473821252538, value=0
010 column=index:customerpatternduration, timestamp=1473821286370, value=900
010 column=index:customerprofileduration, timestamp=1473821286440, value=900
010 column=index:patterncdrindex, timestamp=1473821252822, value=0
010 column=index:patternmarker, timestamp=1473821469254, value=Pattern #1 - 10 outgoing voice calls of 1-30 and toward the same corresp.
010 column=index:profilemarker, timestamp=1473821286471, value=Profile #1
010 column=type:callservice, timestamp=1473821566933, value=Voice
010 column=type:calltype, timestamp=1473821566853, value=MOC
1 row(s) in 0.0670 seconds
hbase(main):018:0> get 'cdr', '010'
COLUMN CELL
customer:cust_imei timestamp=1473821545112, value=350000000000000
customer:cust_imsi timestamp=1473821544911, value=208100000000000
customer:cust_isdn timestamp=1473821544985, value=0600000000
customer:custoperator timestamp=1473821546676, value=FRAF2
index:customercdrcount timestamp=1473821256779, value=10
index:customercount timestamp=1473821252725, value=1
index:customerindex timestamp=1473821252538, value=0
index:customerpatternduration timestamp=1473821286370, value=900
index:customerprofileduration timestamp=1473821286440, value=900
index:patterncdrindex timestamp=1473821252822, value=0
index:patternmarker timestamp=1473821469254, value=Pattern #1 - 10 outgoing voice calls of 1-30 and toward the same corresp.
index:profilemarker timestamp=1473821286471, value=Profile #1
type:callservice timestamp=1473821566933, value=Voice
type:calltype timestamp=1473821566853, value=MOC
14 row(s) in 0.1050 seconds
hbase(main):019:0> exit
Check the HBase Web UI, at http://10.0.0.1:16010
. In our case the HBase Master is in the NameNode.
In the next article, we will learn Apache Spark - in memory big data analytics.