Set up Client Node (Gateway Node) in Hadoop Cluster
Once we have our multi-node hadoop cluster up and running, let us create an EdgeNode or a GatewayNode. Gateway nodes are the interface between the Hadoop cluster and the outside network. Edge nodes are used to run client applications and cluster administration tools.
The edge node does not have to be part of the cluster, however if it is outside of the cluster (meaning it doesn't have any specific Hadoop service roles running on it), it will need some basic components such as Hadoop binaries and current Hadoop cluster config files to submit jobs on the cluster. We will install client tools in our EdgeNode namely, HIVE, SQOOP, FLUME, PIG, OOZIE etc. Before that let’s setup an EdgeNode.
Let's set up an edge node for clients to access the Hadoop Cluster for submitting jobs. Spawn a droplet in DigitalOcean 4 GB Memory / 40 GB Disk / NYC3 - Ubuntu 16.04.1 x64 named as EdgeNode with Private Networking On, so that it can communicate with the other droplets. So currently, our hadoop cluster looks like below:
The setup of EdgeNode will be similar to any hadoop node in the cluster, although there will be no hadoop cluster services running on this node. There will be no entry in the NameNode’s Masters or Salves file for this edge node.
Setup steps involved:
Install Java Runtime Environment:
root@EdgeNode:~# apt-get update root@EdgeNode:~# add-apt-repository ppa:webupd8team/java root@EdgeNode:~# apt-get install oracle-java7-installer root@EdgeNode:~# java -version java version "1.7.0_80" Java(TM) SE Runtime Environment (build 1.7.0_80-b15) Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)
Setup machine alias in host file:
Modify /etc/hosts file as below
127.0.0.1 localhost 10.0.100.3 EdgeNode 10.0.0.1 NameNode 10.0.100.1 DataNode1 10.0.100.2 DataNode2
Setup SSH Server:
EdgeNode requires password less access to NameNode. SSH needs to be setup, to allow password-less login from EdgeNode to NameNode machine in the cluster. The simplest way to achieve this is to generate a public/private key pair, and the public key will be shared with the master node.
root@EdgeNode:~# apt-get install openssh-server root@EdgeNode:~# ssh-keygen -t rsa -P "" root@EdgeNode:~# cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys root@EdgeNode:~# chmod 700 ~/.ssh root@EdgeNode:~# chmod 600 ~/.ssh/authorized_keys
Now copy the Public Key of the EdgeNode & paste in the <code>/root/.ssh/authorized_keys</code> file in the NameNode.
Getting Hadoop distribution & Configuration files:
Let us get all the hadoop binaries and configuration files present in the NameNode into our EdgeNode, so that we have the same version of Hadoop binaries as that in the cluster and the configuration details of our cluster.
root@EdgeNode:~# cd /usr/local root@EdgeNode:~# scp -r root@NameNode:/usr/local/hadoop /usr/local/
Setup Environment Variables:
Now open .bashrc and put these lines at the end of your .bashrc file (Press SHIFT + G to directly go to the end of the file):
root@EdgeNode:/usr/local# vi ~/.bashrc export JAVA_HOME=/usr/lib/jvm/java-7-oracle/jre export PATH=$PATH:$JAVA_HOME/bin export HADOOP_HOME=/usr/local/hadoop export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib" export CLASSPATH=$CLASSPATH:/usr/local/hadoop/lib/*:.
Source the <code>~/.bashrc</code> file:
root@EdgeNode:/usr/local# source ~/.bashrc
Confirm Hadoop Cluster is accessible from EdgeNode:
Time to test the hadoop file system.
root@EdgeNode:~# hadoop fs -ls /
Time to test a hadoop map-reduce job.
root@EdgeNode:~# hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar pi 2 4
Good job. Your client node is ready. Buckle up for installing hive in client node next.