Install HBASE in Hadoop Cluster
Apache HBase provides large-scale tabular storage for Hadoop using the Hadoop Distributed File System (HDFS). Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google's Bigtable. HBase is used in cases where we require random, realtime read/write access to Big Data. We can host very large tables (billions of rows X millions of columns) atop clusters of commodity hardware using HBase. In this article we will Install HBase in a fully distributed hadoop cluster.
HBase scales by splitting all rows into regions. Each region is hosted by exactly one server. Writes are held(sorted) in memory until flush. Reads merge rows in memory with flushed files. Reads & writes to a single row are consistent. A row is an atomic byte array of Key-Value map container with one row key. The row is atomic and gets flushed to disk periodically. But it doesn't have to be flushed into just a single file. It can be broken up in different store files with different properties, and reads can look at just a subset.This advanced design option is called Column Families. Column Family: divide columns into physical files. HBase has neither joins nor indexes as like any distributed DB.
Hbase Installation
We will configure our cluster to host the HBase Master Server in our NameNode & Region Servers in our DataNodes. Also Apache Zookeeper is a pre-requisite for HBase installation. In this case we will configure HBase, to manage its own instance of Zookeeper. We will configure the NameNode to host the Zookeeper Quorum. So let ssh & login to our NameNode.
Master Server Setup
Get the Latest Stable Release of HBase Package from the site:
http://www-us.apache.org/dist/hbase/stable
In the time of writing this article, HBase 1.2.3 is the latest stable version. We will install HBase under /usr/local/ directory.
root@NameNode:~# cd /usr/local
root@NameNode:/usr/local/# wget http://www-us.apache.org/dist/hbase/stable/hbase-1.2.3-bin.tar.gz
root@NameNode:/usr/local/# tar -xzvf hbase-1.2.3-bin.tar.gz >> /dev/null
root@NameNode:/usr/local/# mv hbase-1.2.3 /usr/local/hbase
root@NameNode:/usr/local/# rm hbase-1.2.3-bin.tar.gzSet the HBase environment variables in .bashrc file. Append below lines to the file and source the environment file.
root@NameNode:/usr/local#  vi ~/.bashrc
export HBASE_HOME=/usr/local/hbase
export PATH=$PATH:$HBASE_HOME/bin
export CLASSPATH=$CLASSPATH:/usr/local/hbase/lib/*:.root@NameNode:/usr/local# source ~/.bashrcNext we need to configure HBase environment script and set the Java Home. Also we will configure HBase to manage it's Zookeper Instance.
Open the hbase-env.sh file and append the lines to the file.
root@NameNode:/usr/local/hbase/conf# vi hbase-env.sh
export JAVA_HOME=/usr/lib/jvm/java-7-oracle/jre
export HBASE_MANAGES_ZK=trueNext we will configure the site specific properties of HBase in the file, hbase-site.xml.
vi hbase-site.xml<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://NameNode:8020/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/usr/local/zookeeper</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>NameNode</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
</configuration>Next we have to list down DataNodes which will host the Region Servers, in the file regionservers.
root@NameNode:/usr/local/hbase/conf# vi regionservers
DataNode1
DataNode2Additionally we will create a local directory for Zookeper to maintain it's log file.
root@NameNode:/usr/local/hbase/conf# mkdir -p /usr/local/zookeeperRegion Server Setup
Now we have to configure our DataNodes to act as Region Servers. In our case we have two DataNodes. We will secure copy the hbase directory with the binaries and configuration files from the NameNode to the DataNodes.
root@NameNode:/usr/local/hbase/conf# cd /usr/local 
root@NameNode:/usr/local# scp -r hbase DataNode1:/usr/local
root@NameNode:/usr/local# scp -r hbase DataNode2:/usr/localNext we need to update the Environment configuration of HBase in all the DataNodes. Append the below two lines in the .bashrc files in both the DataNodes.
root@NameNode:/usr/local# ssh root@DataNode1
root@DataNode1:~# vi ~/.bashrc
export HBASE_HOME=/usr/local/hbase
export PATH=$PATH:$HBASE_HOME/bin
export CLASSPATH=$CLASSPATH:/usr/local/hbase/lib/*:.
root@DataNode1:~# source ~/.bashrc
root@DataNode1:~# exitRepeat the above step for all the other DataNodes.
Well we are done with the installation & configuration. So it's time to start the HBase services.
root@NameNode:/usr/local# $HBASE_HOME/bin/start-hbase.shLet us validate the services running in NameNode as well as in the DataNodes.
root@NameNode:/usr/local# jps
5721 NameNode
5943 SecondaryNameNode
6103 ResourceManager
6217 JobHistoryServer
6752 HQuorumPeer
6813 HMaster
7031 Jpsroot@NameNode:/usr/local# ssh root@DataNode1
root@DataNode1:~# jps
3869 DataNode
4004 NodeManager
4196 HRegionServer
4444 Jps
root@DataNode1:~# exitQuickly validate the installation.
root@NameNode:/usr/local# hbase version
HBase 1.2.3
Source code repository git://kalashnikov.att.net/Users/stack/checkouts/hbase.git.commit revision=bd63744624a26dc3350137b564fe746df7a721a4
Compiled by stack on Mon Aug 29 15:13:42 PDT 2016
From source with checksum 0ca49367ef6c3a680888bbc4f1485d18Now let us start HBase shell and check some commands.
root@NameNode:~# $HBASE_HOME/bin/hbase shell
hbase(main):001:0> status
1 active master, 0 backup masters, 2 servers, 0 dead, 1.0000 average load
hbase(main):002:0> list
TABLE
0 row(s) in 0.0610 seconds
=> []
hbase(main):003:0> exitConfigure EdgeNode to access HBase
Let us configure of EdgeNode or Client Node to access HBase. Going forward we will see Hive & HBase interaction. Logon to the EdgeNode & secure copy the HBase directory from the NameNode.
root@EdgeNode:~# cd /usr/local
root@EdgeNode:/usr/local# scp -r root@NameNode:/usr/local/hbase /usr/local/After that we will set the environment variables accordingly.
root@EdgeNode:/usr/local# vi ~/.bashrc
export HBASE_HOME=/usr/local/hbase
export PATH=$PATH:$HBASE_HOME/bin
export CLASSPATH=$CLASSPATH:/usr/local/hbase/lib/*:.
root@EdgeNode:/usr/local# source ~/.bashrcNext login to HBase shell. We will create table with family columns, put some data in the table scan and get the data etc.
root@NameNode:~# $HBASE_HOME/bin/hbase shell
hbase(main):001:0> create 'cdr', 'index', 'customer', 'type', 'timing', 'usage', 'correspondent', 'network'
0 row(s) in 1.8530 seconds
=> Hbase::Table - cdr
hbase(main):002:0> put 'cdr', '010', 'index:customerindex', '0'
0 row(s) in 0.3470 seconds
hbase(main):003:0> put 'cdr', '010', 'index:customercount', '1'
0 row(s) in 0.0110 seconds
hbase(main):004:0> put 'cdr', '010', 'index:patterncdrindex', '0'
0 row(s) in 0.0210 seconds
hbase(main):005:0> put 'cdr', '010', 'index:customercdrcount', '10'
0 row(s) in 0.0190 seconds
hbase(main):006:0> put 'cdr', '010', 'index:customerpatternduration', '900'
0 row(s) in 0.0200 seconds
hbase(main):007:0> put 'cdr', '010', 'index:customerprofileduration', '900'
0 row(s) in 0.0090 seconds
hbase(main):008:0> put 'cdr', '010', 'index:profilemarker', 'Profile #1'
0 row(s) in 0.0070 seconds
hbase(main):009:0> put 'cdr', '010', 'index:patternmarker', 'Pattern #1 - 10 outgoing voice calls of 1-30 and toward the same corresp.'
0 row(s) in 1.2350 seconds
hbase(main):010:0> put 'cdr', '010', 'customer:cust_imsi', '208100000000000'
0 row(s) in 0.0210 seconds
hbase(main):011:0> put 'cdr', '010', 'customer:cust_isdn', '0600000000'
0 row(s) in 0.0110 seconds
hbase(main):012:0> put 'cdr', '010', 'customer:cust_imei', '350000000000000'
0 row(s) in 0.0240 seconds
hbase(main):013:0> put 'cdr', '010', 'customer:custoperator', 'FRAF2'
0 row(s) in 0.0160 seconds
hbase(main):014:0> put 'cdr', '010', 'type:calltype', 'MOC'
0 row(s) in 0.0340 seconds
hbase(main):015:0> put 'cdr', '010', 'type:callservice', 'Voice'
0 row(s) in 0.0090 seconds
hbase(main):016:0> list 'cdr'
TABLE
cdr
1 row(s) in 0.2020 seconds
hbase(main):017:0> scan 'cdr'
ROW                                           COLUMN+CELL
 010                                          column=customer:cust_imei, timestamp=1473821545112, value=350000000000000
 010                                          column=customer:cust_imsi, timestamp=1473821544911, value=208100000000000
 010                                          column=customer:cust_isdn, timestamp=1473821544985, value=0600000000
 010                                          column=customer:custoperator, timestamp=1473821546676, value=FRAF2
 010                                          column=index:customercdrcount, timestamp=1473821256779, value=10
 010                                          column=index:customercount, timestamp=1473821252725, value=1
 010                                          column=index:customerindex, timestamp=1473821252538, value=0
 010                                          column=index:customerpatternduration, timestamp=1473821286370, value=900
 010                                          column=index:customerprofileduration, timestamp=1473821286440, value=900
 010                                          column=index:patterncdrindex, timestamp=1473821252822, value=0
 010                                          column=index:patternmarker, timestamp=1473821469254, value=Pattern #1 - 10 outgoing voice calls of 1-30 and toward the same corresp.
 010                                          column=index:profilemarker, timestamp=1473821286471, value=Profile #1
 010                                          column=type:callservice, timestamp=1473821566933, value=Voice
 010                                          column=type:calltype, timestamp=1473821566853, value=MOC
1 row(s) in 0.0670 seconds
hbase(main):018:0> get 'cdr', '010'
COLUMN                                        CELL
 customer:cust_imei                           timestamp=1473821545112, value=350000000000000
 customer:cust_imsi                           timestamp=1473821544911, value=208100000000000
 customer:cust_isdn                           timestamp=1473821544985, value=0600000000
 customer:custoperator                        timestamp=1473821546676, value=FRAF2
 index:customercdrcount                       timestamp=1473821256779, value=10
 index:customercount                          timestamp=1473821252725, value=1
 index:customerindex                          timestamp=1473821252538, value=0
 index:customerpatternduration                timestamp=1473821286370, value=900
 index:customerprofileduration                timestamp=1473821286440, value=900
 index:patterncdrindex                        timestamp=1473821252822, value=0
 index:patternmarker                          timestamp=1473821469254, value=Pattern #1 - 10 outgoing voice calls of 1-30 and toward the same corresp.
 index:profilemarker                          timestamp=1473821286471, value=Profile #1
 type:callservice                             timestamp=1473821566933, value=Voice
 type:calltype                                timestamp=1473821566853, value=MOC
14 row(s) in 0.1050 seconds
hbase(main):019:0> exitCheck the HBase Web UI, at http://10.0.0.1:16010. In our case the HBase Master is in the NameNode.
In the next article, we will learn Apache Spark - in memory big data analytics.