Logo DWBI.org Login / Sign Up
Sign Up
Have Login?
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Login
New Account?
Recovery
Go to Login
By continuing you indicate that you agree to Terms of Service and Privacy Policy of the site.
Big Data

Install HBASE in Hadoop Cluster

 
Updated on Oct 03, 2020

Apache HBase provides large-scale tabular storage for Hadoop using the Hadoop Distributed File System (HDFS). Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google's Bigtable. HBase is used in cases where we require random, realtime read/write access to Big Data. We can host very large tables (billions of rows X millions of columns) atop clusters of commodity hardware using HBase. In this article we will Install HBase in a fully distributed hadoop cluster.

HBase scales by splitting all rows into regions. Each region is hosted by exactly one server. Writes are held(sorted) in memory until flush. Reads merge rows in memory with flushed files. Reads & writes to a single row are consistent. A row is an atomic byte array of Key-Value map container with one row key. The row is atomic and gets flushed to disk periodically. But it doesn't have to be flushed into just a single file. It can be broken up in different store files with different properties, and reads can look at just a subset.This advanced design option is called Column Families. Column Family: divide columns into physical files. HBase has neither joins nor indexes as like any distributed DB.

Hbase Installation

We will configure our cluster to host the HBase Master Server in our NameNode & Region Servers in our DataNodes. Also Apache Zookeeper is a pre-requisite for HBase installation. In this case we will configure HBase, to manage its own instance of Zookeeper. We will configure the NameNode to host the Zookeeper Quorum. So let ssh & login to our NameNode.

Master Server Setup

Get the Latest Stable Release of HBase Package from the site:
http://www-us.apache.org/dist/hbase/stable

In the time of writing this article, HBase 1.2.3 is the latest stable version. We will install HBase under /usr/local/ directory.

root@NameNode:~# cd /usr/local
root@NameNode:/usr/local/# wget http://www-us.apache.org/dist/hbase/stable/hbase-1.2.3-bin.tar.gz
root@NameNode:/usr/local/# tar -xzvf hbase-1.2.3-bin.tar.gz >> /dev/null
root@NameNode:/usr/local/# mv hbase-1.2.3 /usr/local/hbase
root@NameNode:/usr/local/# rm hbase-1.2.3-bin.tar.gz

Set the HBase environment variables in .bashrc file. Append below lines to the file and source the environment file.

root@NameNode:/usr/local#  vi ~/.bashrc

export HBASE_HOME=/usr/local/hbase
export PATH=$PATH:$HBASE_HOME/bin
export CLASSPATH=$CLASSPATH:/usr/local/hbase/lib/*:.
root@NameNode:/usr/local# source ~/.bashrc

Next we need to configure HBase environment script and set the Java Home. Also we will configure HBase to manage it's Zookeper Instance.

Open the hbase-env.sh file and append the lines to the file.

root@NameNode:/usr/local/hbase/conf# vi hbase-env.sh

export JAVA_HOME=/usr/lib/jvm/java-7-oracle/jre
export HBASE_MANAGES_ZK=true

Next we will configure the site specific properties of HBase in the file, hbase-site.xml.

vi hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://NameNode:8020/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/usr/local/zookeeper</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>NameNode</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
</configuration>

Next we have to list down DataNodes which will host the Region Servers, in the file regionservers.

root@NameNode:/usr/local/hbase/conf# vi regionservers

DataNode1
DataNode2

Additionally we will create a local directory for Zookeper to maintain it's log file.

root@NameNode:/usr/local/hbase/conf# mkdir -p /usr/local/zookeeper

Region Server Setup

Now we have to configure our DataNodes to act as Region Servers. In our case we have two DataNodes. We will secure copy the hbase directory with the binaries and configuration files from the NameNode to the DataNodes.

root@NameNode:/usr/local/hbase/conf# cd /usr/local 
root@NameNode:/usr/local# scp -r hbase DataNode1:/usr/local
root@NameNode:/usr/local# scp -r hbase DataNode2:/usr/local

Next we need to update the Environment configuration of HBase in all the DataNodes. Append the below two lines in the .bashrc files in both the DataNodes.

root@NameNode:/usr/local# ssh root@DataNode1
root@DataNode1:~# vi ~/.bashrc

export HBASE_HOME=/usr/local/hbase
export PATH=$PATH:$HBASE_HOME/bin
export CLASSPATH=$CLASSPATH:/usr/local/hbase/lib/*:.


root@DataNode1:~# source ~/.bashrc
root@DataNode1:~# exit

Repeat the above step for all the other DataNodes.

Well we are done with the installation & configuration. So it's time to start the HBase services.

root@NameNode:/usr/local# $HBASE_HOME/bin/start-hbase.sh

Let us validate the services running in NameNode as well as in the DataNodes.

root@NameNode:/usr/local# jps

5721 NameNode
5943 SecondaryNameNode
6103 ResourceManager
6217 JobHistoryServer
6752 HQuorumPeer
6813 HMaster
7031 Jps
root@NameNode:/usr/local# ssh root@DataNode1
root@DataNode1:~# jps

3869 DataNode
4004 NodeManager
4196 HRegionServer
4444 Jps

root@DataNode1:~# exit

Quickly validate the installation.

root@NameNode:/usr/local# hbase version
HBase 1.2.3
Source code repository git://kalashnikov.att.net/Users/stack/checkouts/hbase.git.commit revision=bd63744624a26dc3350137b564fe746df7a721a4
Compiled by stack on Mon Aug 29 15:13:42 PDT 2016
From source with checksum 0ca49367ef6c3a680888bbc4f1485d18

Now let us start HBase shell and check some commands.

root@NameNode:~# $HBASE_HOME/bin/hbase shell

hbase(main):001:0> status
1 active master, 0 backup masters, 2 servers, 0 dead, 1.0000 average load

hbase(main):002:0> list
TABLE
0 row(s) in 0.0610 seconds

=> []
hbase(main):003:0> exit

Configure EdgeNode to access HBase

Let us configure of EdgeNode or Client Node to access HBase. Going forward we will see Hive & HBase interaction. Logon to the EdgeNode & secure copy the HBase directory from the NameNode.

root@EdgeNode:~# cd /usr/local
root@EdgeNode:/usr/local# scp -r root@NameNode:/usr/local/hbase /usr/local/

 After that we will set the environment variables accordingly.

root@EdgeNode:/usr/local# vi ~/.bashrc

export HBASE_HOME=/usr/local/hbase
export PATH=$PATH:$HBASE_HOME/bin
export CLASSPATH=$CLASSPATH:/usr/local/hbase/lib/*:.

root@EdgeNode:/usr/local# source ~/.bashrc

Next login to HBase shell. We will create table with family columns, put some data in the table scan and get the data etc.

root@NameNode:~# $HBASE_HOME/bin/hbase shell

hbase(main):001:0> create 'cdr', 'index', 'customer', 'type', 'timing', 'usage', 'correspondent', 'network'
0 row(s) in 1.8530 seconds

=> Hbase::Table - cdr
hbase(main):002:0> put 'cdr', '010', 'index:customerindex', '0'
0 row(s) in 0.3470 seconds

hbase(main):003:0> put 'cdr', '010', 'index:customercount', '1'
0 row(s) in 0.0110 seconds

hbase(main):004:0> put 'cdr', '010', 'index:patterncdrindex', '0'
0 row(s) in 0.0210 seconds

hbase(main):005:0> put 'cdr', '010', 'index:customercdrcount', '10'
0 row(s) in 0.0190 seconds

hbase(main):006:0> put 'cdr', '010', 'index:customerpatternduration', '900'
0 row(s) in 0.0200 seconds

hbase(main):007:0> put 'cdr', '010', 'index:customerprofileduration', '900'
0 row(s) in 0.0090 seconds

hbase(main):008:0> put 'cdr', '010', 'index:profilemarker', 'Profile #1'
0 row(s) in 0.0070 seconds

hbase(main):009:0> put 'cdr', '010', 'index:patternmarker', 'Pattern #1 - 10 outgoing voice calls of 1-30 and toward the same corresp.'
0 row(s) in 1.2350 seconds

hbase(main):010:0> put 'cdr', '010', 'customer:cust_imsi', '208100000000000'
0 row(s) in 0.0210 seconds

hbase(main):011:0> put 'cdr', '010', 'customer:cust_isdn', '0600000000'
0 row(s) in 0.0110 seconds

hbase(main):012:0> put 'cdr', '010', 'customer:cust_imei', '350000000000000'
0 row(s) in 0.0240 seconds

hbase(main):013:0> put 'cdr', '010', 'customer:custoperator', 'FRAF2'
0 row(s) in 0.0160 seconds

hbase(main):014:0> put 'cdr', '010', 'type:calltype', 'MOC'
0 row(s) in 0.0340 seconds

hbase(main):015:0> put 'cdr', '010', 'type:callservice', 'Voice'
0 row(s) in 0.0090 seconds

hbase(main):016:0> list 'cdr'
TABLE
cdr
1 row(s) in 0.2020 seconds

hbase(main):017:0> scan 'cdr'
ROW                                           COLUMN+CELL
 010                                          column=customer:cust_imei, timestamp=1473821545112, value=350000000000000
 010                                          column=customer:cust_imsi, timestamp=1473821544911, value=208100000000000
 010                                          column=customer:cust_isdn, timestamp=1473821544985, value=0600000000
 010                                          column=customer:custoperator, timestamp=1473821546676, value=FRAF2
 010                                          column=index:customercdrcount, timestamp=1473821256779, value=10
 010                                          column=index:customercount, timestamp=1473821252725, value=1
 010                                          column=index:customerindex, timestamp=1473821252538, value=0
 010                                          column=index:customerpatternduration, timestamp=1473821286370, value=900
 010                                          column=index:customerprofileduration, timestamp=1473821286440, value=900
 010                                          column=index:patterncdrindex, timestamp=1473821252822, value=0
 010                                          column=index:patternmarker, timestamp=1473821469254, value=Pattern #1 - 10 outgoing voice calls of 1-30 and toward the same corresp.
 010                                          column=index:profilemarker, timestamp=1473821286471, value=Profile #1
 010                                          column=type:callservice, timestamp=1473821566933, value=Voice
 010                                          column=type:calltype, timestamp=1473821566853, value=MOC
1 row(s) in 0.0670 seconds

hbase(main):018:0> get 'cdr', '010'
COLUMN                                        CELL
 customer:cust_imei                           timestamp=1473821545112, value=350000000000000
 customer:cust_imsi                           timestamp=1473821544911, value=208100000000000
 customer:cust_isdn                           timestamp=1473821544985, value=0600000000
 customer:custoperator                        timestamp=1473821546676, value=FRAF2
 index:customercdrcount                       timestamp=1473821256779, value=10
 index:customercount                          timestamp=1473821252725, value=1
 index:customerindex                          timestamp=1473821252538, value=0
 index:customerpatternduration                timestamp=1473821286370, value=900
 index:customerprofileduration                timestamp=1473821286440, value=900
 index:patterncdrindex                        timestamp=1473821252822, value=0
 index:patternmarker                          timestamp=1473821469254, value=Pattern #1 - 10 outgoing voice calls of 1-30 and toward the same corresp.
 index:profilemarker                          timestamp=1473821286471, value=Profile #1
 type:callservice                             timestamp=1473821566933, value=Voice
 type:calltype                                timestamp=1473821566853, value=MOC
14 row(s) in 0.1050 seconds

hbase(main):019:0> exit

Check the HBase Web UI, at http://10.0.0.1:16010. In our case the HBase Master is in the NameNode.

In the next article, we will learn Apache Spark - in memory big data analytics.

Top 10 Articles