We would like to perform practical test of Apache SQOOP import/export utility between ORACLE relational database & Apache HADOOP file system, let us quickly setup an ORACLE server. For that we will be using cloud based services/servers as we did previously using Digital Ocean.

Till now, in our Digital Ocean Private Networking Cluster we have our nodes as below:

Name NodeNameNode10.0.0.1
Data NodeDataNode110.0.100.1
Data NodeDataNode210.0.100.2
Client NodeEdgeNode10.0.100.3

1 NameNode, 2 DataNodes & 1 EdgeNode/ClientNode/GatewayNode (Hadoop Binaries & Configuration files, SQOOP, HIVE)

Installing Oracle in Linux

Let us spawn a droplet in Digital Ocean:

  • Distributions: CentOS 7.2 x64
  • Size: 2GB /2 CPUs; 40 GB SSD Disk; 3TB Transfer
  • Data center region: New York3. ( The Data center should be same as that, where your hadoop cluster & edge node resides in order to enable private networking )
  • Additional Options: Check Private Networking on. (So that Sqoop in Edgenode and Datanodes in hadoop cluster can communicate with the oracle server)
  • Hostname: oraxe

Finally create the droplet.
Once the droplet is up ssh to the machine. Firstly setup the /etc/hosts file correctly. Remove all the lines and add the below two lines, save and quit. In my case the Private IP address is

[root@oraxe ~]# vi /etc/hosts oraxe localhost

Note: The IP mentioned above in the first line is the Private IP address of the droplet in Digital Ocean not the IPv4 Public address.

Update/Install the packages for libaio, bc, flex and unzip.

[root@oraxe ~]# yum install libaio bc flex unzip

Download Oracle Express Edition from the below link:
Once you accept and license agreement Copy the URL for the download Oracle Database Express Edition 11g Release 2 for Linux x64, so that we can directly download in the server. Next we rename the file as required.

[root@oraxe ~]# wget http://download.oracle.com/otn/linux/oracle11g/xe/oracle-xe-11.2.0-1.0.x86_64.rpm.zip?AuthParam=1472010283_e868ce0dc999fce3da22a522653a185b
[root@oraxe ~]# mv oracle-xe-11.2.0-1.0.x86_64.rpm.zip\?AuthParam\=1472010283_e868ce0dc999fce3da22a522653a185b oracle-xe-11.2.0-1.0.x86_64.rpm.zip
[root@oraxe ~]# unzip -q oracle-xe-11.2.0-1.0.x86_64.rpm.zip
[root@oraxe ~]# cd Disk1
[root@oraxe Disk1]# ls	
	oracle-xe-11.2.0-1.0.x86_64.rpm  response  upgrade

Installation of Oracle mostly encounter the below error:
“This system does not meet the minimum requirements for swap space. Based on the amount of physical memory available on the system, Oracle Database 11g Express Edition requires 2048 MB of swap space. This system has 0 MB of swap space. Configure more swap space on the system and retry the installation.” Lets fix the swap space issue.

[root@oraxe Disk1]# dd if=/dev/zero of=/swapfile bs=1024M count=2
[root@oraxe Disk1]# mkswap /swapfile
[root@oraxe Disk1]# free -m | grep Swap
[root@oraxe Disk1]# swapon /swapfile
[root@oraxe Disk1]# free -m | grep Swap

Now Let us start the Oracle Installer.

[root@oraxe Disk1]# rpm -ivh oracle-xe-11.2.0-1.0.x86_64.rpm

Once the Database Installation Completed successfully, we need to configure the database. Mostly for all the options we will choose the default value, simply by pressing Enter.

[root@oraxe Disk1]# /etc/init.d/oracle-xe configure
HTTP Port: 8088
Listener Port: 1521
SYS/SYSTEM Password: (e.g. hadoop 1234)
Start Oracle on boot: y

Once done you should see the success status as below
Starting Oracle Net Listener...Done  
Configuring database...Done  
Starting Oracle Database 11g Express Edition instance...Done  
Installation completed successfully.

The installation created the directory /u01 under which Oracle XE is installed. To set the required Oracle environment variables, use the oracle_env.sh the script included under cd /u01/app/oracle/product/11.2.0/xe/bin

[root@oraxe Disk1]# cd /u01/app/oracle/product/11.2.0/xe/bin
[root@oraxe /u01/app/oracle/product/11.2.0/xe/bin]# . ./oracle_env.sh

This sets the environment for our current session. But in order to set the environment permanently, append the following line to the .bashrc file append the below line, save and quit.

[root@oraxe /u01/app/oracle/product/11.2.0/xe/bin]# vi .bashrc

	. /u01/app/oracle/product/11.2.0/xe/bin/oracle_env.sh

Now let’s try to access Oracle SQL*Plus utility:

[root@oraxe /u01/app/oracle/product/11.2.0/xe/bin]# sqlplus /nolog
SQL*Plus: Release Production on Wed Sep 21 08:17:26 2011
Copyright (c) 1982, 2011, Oracle.  All rights reserved.

SQL> connect sys/Password as sysdba

Now we will unlock & reset the password of a default database called HR, so that we can use the same from sqoop.


Login to this HR account later & check the tables/data, as we will be using these tables in our sqoop import job.
To allow remote access to Oracle 11g XE GUI (as well as Application Express GUI) issue the following from SQL*Plus

SQL> exit

We will now be able to access the Oracle 11g XE Home Page GUI at http://localhost:8080/apex/f?p=4950:1
Replace localhost above with your IP as required and log in as SYSTEM and the password.

We are done with the Oracle Installation.
We need to map the Private IP address & Hostname of the Oracle Server in all the Nodes in our cluster. Once done, it’s time for us to login to our client node or edge node where we have our SQOOP installed. In the next article we will see SQOOP in action with Oracle RDBMS.

Have a question on this subject?

Ask questions to our expert community members and clear your doubts. Asking question or engaging in technical discussion is both easy and rewarding.

Are you on Twitter?

Start following us. This way we will always keep you updated with what's happening in Data Analytics community. We won't spam you. Promise.

  • Introduction to Apache Hadoop

    The Apache Hadoop is next big data platform. Apache Hadoop is an open-source, java-based framework software for reliable, scalable & distributed computing. The Apache Hadoop allows distributed processing of very large data sets across clusters of...

  • Hadoop DataLake Implementation Part 9

    In this article we will load our final fact table i.e. stock.

  • How to Setup Hadoop Multi Node Cluster - Step By Step

    Setting up Hadoop in a single machine is easy, but no fun. Why? Because Hadoop is not meant for a single machine. Hadoop is meant to run on a computing cluster comprising of many machines. Running HDFS and MapReduce on a single machine is great for...

  • Oracle Installation for SQOOP Import

    We would like to perform practical test of Apache SQOOP import/export utility between ORACLE relational database & Apache HADOOP file system, let us quickly setup an ORACLE server. For that we will be using cloud based services/servers as we did...

  • Hadoop MapReduce Basics

    The Hadoop, since its inception is changing the way the enterprises store, process and analyse data. MapReduce is the core part of the Hadoop framework and we can also call it as the core processing engine of Hadoop. It is a programming model...

  • Hadoop DataLake Implementation Part 4

    Now that our dummy OLTP source system & Hadoop HDFS directory structure is ready, we will first load the ‘dates’ data file in HDFS and further to a hive table.

  • Hadoop DataLake Implementation Part 5

    In this article we will load the showroom master data from MySQL source system to HDFS using Sqoop as SCD Type 1.

  • Hadoop DataLake Implementation Part 7

    In this article we will load our master data table ‘Product’ as Slowly Changing Dimension of Type 2 to maintain full history, so as to analyze the sales and stocks data with reference to the historical master data.

  • Install HBASE in Hadoop Cluster

    Apache HBase provides large-scale tabular storage for Hadoop using the Hadoop Distributed File System (HDFS). Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google's Bigtable. HBase is used in cases...

  • Stream Webserver Log into Hdfs using FLUME

    In this article we will use Apache Flume to gather stream access log data from our remote Web Server into Hadoop Distributed File System. We will be analyzing the access log in a real-time basis. So we have to setup Flume such that it collects the...