Logo DWBI.org Login / Sign Up
Sign Up
Have Login?
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Login
New Account?
Recovery
Go to Login
By continuing you indicate that you agree to Terms of Service and Privacy Policy of the site.
Big Data

Set up Client Node (Gateway Node) in Hadoop Cluster

 
Updated on Oct 03, 2020

Once we have our multi-node hadoop cluster up and running, let us create an EdgeNode or a GatewayNode. Gateway nodes are the interface between the Hadoop cluster and the outside network. Edge nodes are used to run client applications and cluster administration tools.

The edge node does not have to be part of the cluster, however if it is outside of the cluster (meaning it doesn't have any specific Hadoop service roles running on it), it will need some basic components such as Hadoop binaries and current Hadoop cluster config files to submit jobs on the cluster. We will install client tools in our EdgeNode namely, HIVE, SQOOP, FLUME, PIG, OOZIE etc. Before that let’s setup an EdgeNode.

Let's set up an edge node for clients to access the Hadoop Cluster for submitting jobs. Spawn a droplet in DigitalOcean 4 GB Memory / 40 GB Disk / NYC3 - Ubuntu 16.04.1 x64 named as EdgeNode with Private Networking On, so that it can communicate with the other droplets. So currently, our hadoop cluster looks like below:

NodeHostnameIP
Name NodeNameNode10.0.0.1
Data NodeDataNode110.0.100.1
Data NodeDataNode210.0.100.2
Client NodeEdgeNode10.0.100.3

The setup of EdgeNode will be similar to any hadoop node in the cluster, although there will be no hadoop cluster services running on this node. There will be no entry in the NameNode’s Masters or Salves file for this edge node.

Setup steps involved:

Install Java Runtime Environment:

root@EdgeNode:~# apt-get update
root@EdgeNode:~# add-apt-repository ppa:webupd8team/java
root@EdgeNode:~# apt-get install oracle-java7-installer
root@EdgeNode:~# java -version
java version "1.7.0_80"
Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)

Setup machine alias in host file:

Modify /etc/hosts file as below

127.0.0.1	localhost
10.0.100.3	EdgeNode
10.0.0.1	NameNode
10.0.100.1	DataNode1
10.0.100.2	DataNode2

Setup SSH Server:

EdgeNode requires password less access to NameNode. SSH needs to be setup, to allow password-less login from EdgeNode to NameNode machine in the cluster. The simplest way to achieve this is to generate a public/private key pair, and the public key will be shared with the master node.

root@EdgeNode:~# apt-get  install openssh-server
root@EdgeNode:~# ssh-keygen -t rsa -P ""

root@EdgeNode:~# cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys
root@EdgeNode:~# chmod 700 ~/.ssh
root@EdgeNode:~# chmod 600 ~/.ssh/authorized_keys

Now copy the Public Key of the EdgeNode & paste in the <code>/root/.ssh/authorized_keys</code> file in the NameNode.

Getting Hadoop distribution & Configuration files:

Let us get all the hadoop binaries and configuration files present in the NameNode into our EdgeNode, so that we have the same version of Hadoop binaries as that in the cluster and the configuration details of our cluster.

root@EdgeNode:~# cd /usr/local
root@EdgeNode:~# scp -r root@NameNode:/usr/local/hadoop /usr/local/

Setup Environment Variables:

Now open .bashrc and put these lines at the end of your .bashrc file (Press SHIFT + G to directly go to the end of the file):

root@EdgeNode:/usr/local# vi ~/.bashrc

export JAVA_HOME=/usr/lib/jvm/java-7-oracle/jre
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export CLASSPATH=$CLASSPATH:/usr/local/hadoop/lib/*:.

Source the <code>~/.bashrc</code> file:

root@EdgeNode:/usr/local# source ~/.bashrc

Confirm Hadoop Cluster is accessible from EdgeNode:

Time to test the hadoop file system.

root@EdgeNode:~# hadoop fs -ls /

Time to test a hadoop map-reduce job.

root@EdgeNode:~# hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar pi 2 4

Good job. Your client node is ready. Buckle up for installing hive in client node next.

PrimeChess

PrimeChess.org

PrimeChess.org makes elite chess training accessible and affordable for everyone. For the past 6 years, we have offered free chess camps for kids in Singapore and India, and during that time, we also observed many average-rated coaches charging far too much for their services.

To change that, we assembled a team of top-rated coaches including International Masters (IM) or coaches with multiple IM or GM norms, to provide online classes starting from $50 per month (8 classes each month + 4 tournaments)

This affordability is only possible if we get more students. This is why it will be very helpful if you could please pass-on this message to others.

Exclucively For Indian Residents: 
Basic - ₹1500
Intermediate- ₹2000
Advanced - ₹2500

Top 10 Articles