Logo DWBI.org Login / Sign Up
Sign Up
Have Login?
New Account?
Go to Login
Big Data

Set up Client Node (Gateway Node) in Hadoop Cluster

Updated on Oct 03, 2020

Once we have our multi-node hadoop cluster up and running, let us create an EdgeNode or a GatewayNode. Gateway nodes are the interface between the Hadoop cluster and the outside network. Edge nodes are used to run client applications and cluster administration tools.

The edge node does not have to be part of the cluster, however if it is outside of the cluster (meaning it doesn't have any specific Hadoop service roles running on it), it will need some basic components such as Hadoop binaries and current Hadoop cluster config files to submit jobs on the cluster. We will install client tools in our EdgeNode namely, HIVE, SQOOP, FLUME, PIG, OOZIE etc. Before that let’s setup an EdgeNode.

Let's set up an edge node for clients to access the Hadoop Cluster for submitting jobs. Spawn a droplet in DigitalOcean 4 GB Memory / 40 GB Disk / NYC3 - Ubuntu 16.04.1 x64 named as EdgeNode with Private Networking On, so that it can communicate with the other droplets. So currently, our hadoop cluster looks like below:

Name NodeNameNode10.0.0.1
Data NodeDataNode110.0.100.1
Data NodeDataNode210.0.100.2
Client NodeEdgeNode10.0.100.3

The setup of EdgeNode will be similar to any hadoop node in the cluster, although there will be no hadoop cluster services running on this node. There will be no entry in the NameNode’s Masters or Salves file for this edge node.

Setup steps involved:

Install Java Runtime Environment:

root@EdgeNode:~# apt-get update
root@EdgeNode:~# add-apt-repository ppa:webupd8team/java
root@EdgeNode:~# apt-get install oracle-java7-installer
root@EdgeNode:~# java -version
java version "1.7.0_80"
Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)

Setup machine alias in host file:

Modify /etc/hosts file as below	localhost	EdgeNode	NameNode	DataNode1	DataNode2

Setup SSH Server:

EdgeNode requires password less access to NameNode. SSH needs to be setup, to allow password-less login from EdgeNode to NameNode machine in the cluster. The simplest way to achieve this is to generate a public/private key pair, and the public key will be shared with the master node.

root@EdgeNode:~# apt-get  install openssh-server
root@EdgeNode:~# ssh-keygen -t rsa -P ""

root@EdgeNode:~# cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys
root@EdgeNode:~# chmod 700 ~/.ssh
root@EdgeNode:~# chmod 600 ~/.ssh/authorized_keys

Now copy the Public Key of the EdgeNode & paste in the <code>/root/.ssh/authorized_keys</code> file in the NameNode.

Getting Hadoop distribution & Configuration files:

Let us get all the hadoop binaries and configuration files present in the NameNode into our EdgeNode, so that we have the same version of Hadoop binaries as that in the cluster and the configuration details of our cluster.

root@EdgeNode:~# cd /usr/local
root@EdgeNode:~# scp -r root@NameNode:/usr/local/hadoop /usr/local/

Setup Environment Variables:

Now open .bashrc and put these lines at the end of your .bashrc file (Press SHIFT + G to directly go to the end of the file):

root@EdgeNode:/usr/local# vi ~/.bashrc

export JAVA_HOME=/usr/lib/jvm/java-7-oracle/jre
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export CLASSPATH=$CLASSPATH:/usr/local/hadoop/lib/*:.

Source the <code>~/.bashrc</code> file:

root@EdgeNode:/usr/local# source ~/.bashrc

Confirm Hadoop Cluster is accessible from EdgeNode:

Time to test the hadoop file system.

root@EdgeNode:~# hadoop fs -ls /

Time to test a hadoop map-reduce job.

root@EdgeNode:~# hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar pi 2 4

Good job. Your client node is ready. Buckle up for installing hive in client node next.