SQOOP import from Oracle

Updated on Oct 03, 2020

In this article we will use Apache SQOOP to import data from Oracle database. Now that we have an oracle server in our cluster ready, let us login to EdgeNode. Next we will configure sqoop to import this data in HDFS file system followed by direct import into Hive tables.

SQOOP Import to HDFS

Now it’s time to explore Sqoop. Let us first invoke the Sqoop utility to list the databases accessible to an user in Oracle.

root@EdgeNode:~# cd $SQOOP_HOME/sqoop_work
root@EdgeNode:/usr/local/sqoop/sqoop_work# export HADOOP_OPTS=-Djava.security.egd=file:/dev/../dev/urandom
root@EdgeNode:/usr/local/sqoop/sqoop_work# sqoop list-databases -D mapred.child.java.opts="\-Djava.security.egd=file:/dev/../dev/urandom" --connect "jdbc:oracle:thin:@10.0.100.4:1521:XE" --username "SYSTEM" --password "hadoop1234"

In order to secure password, Use the option -P, which will prompt for password as below:

root@EdgeNode:/usr/local/sqoop/sqoop_work# sqoop list-databases -D mapred.child.java.opts="\-Djava.security.egd=file:/dev/../dev/urandom" --connect "jdbc:oracle:thin:@10.0.100.4:1521:XE" --username "SYSTEM"  -P

Alternatively we can also use --options-file, which keeps the server user and password information as below:

root@EdgeNode:/usr/local/sqoop/sqoop_work# vi import_oracle.txt
	
--connect
"jdbc:oracle:thin:@10.0.100.4:1521:XE"
--username
"SYSTEM"  
--password
"hadoop1234"

root@EdgeNode:/usr/local/sqoop/sqoop_work# sqoop list-databases -D mapred.child.java.opts="\-Djava.security.egd=file:/dev/../dev/urandom" --options-file /usr/local/sqoop/sqoop_work/import_oracle.txt

Now let’s use another Sqoop utility to list down the available tables in a database.

root@EdgeNode:/usr/local/sqoop/sqoop_work# sqoop list-tables -D mapred.child.java.opts="\-Djava.security.egd=file:/dev/../dev/urandom" --connect jdbc:oracle:thin:@10.0.100.4:1521/xe --username "HR" --password "hrpass1234"

Now it’s finally time to bring some data from Oracle table to HDFS using SQOOP import. We have a lot of options to control the import process which will be discussed in more detail at a later point. But now take a look at the import statement below:

root@EdgeNode:/usr/local/sqoop/sqoop_work# sqoop import -D mapred.child.java.opts="\-Djava.security.egd=file:/dev/../dev/urandom" \
--bindir ./ \
--connect "jdbc:oracle:thin:@10.0.100.4:1521/XE" \
--username "HR" \
--password "hrpass1234" \
--table HR.JOBS \
--as-avrodatafile \
--compression-codec snappy \
--target-dir /user/root/jobs \
--fetch-size 1000 \
--append \
--num-mappers 1 \
--validate

Once the sqoop import is successful, let us check the data in HDFS.

root@EdgeNode:/usr/local/sqoop/sqoop_work# cd $HADOOP_HOME/bin
root@EdgeNode:/usr/local/hadoop/bin# hadoop fs -ls /user/root/jobs
	/user/root/jobs/part-m-00000.avro

root@EdgeNode:/usr/local/hadoop/bin# hadoop fs -cat /user/root/jobs/part-m-00000.avro

Hive Table

The next step we might be interested on is to load the data in our hadoop Hive database. Since we have imported the data previously as AVRO, we will be creating an avro based table in Hive. For that we need to move the avro schema file into HDFS.

root@EdgeNode:/usr/local/hadoop/bin# hadoop fs -mkdir -p /user/root/schema
root@EdgeNode:/usr/local/hadoop/bin# hadoop fs -copyFromLocal /usr/local/sqoop/sqoop_work/HR_JOBS.avsc /user/root/schema/jobs.avsc

Now let us create & load the hive table.

root@EdgeNode:/usr/local/hadoop/bin# cd $HIVE_HOME/hive_work
root@EdgeNode:/usr/local/hive/hive_work# hive
hive> SHOW DATABASES;
hive> USE default;

hive> CREATE TABLE jobs
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES ('avro.schema.url'='hdfs:///user/root/schema/jobs.avsc');

hive> LOAD DATA INPATH '/user/root/jobs/part-*.avro' OVERWRITE INTO TABLE jobs;
hive> select count(1) from jobs;
hive> select * from jobs limit 10;
hive> quit;

SQOOP import to HIVE

Let us now import another table using SQOOP and load it directly to our hadoop hive warehouse.

root@EdgeNode:/usr/local/hive/hive_work# cd $SQOOP_HOME/sqoop_work
root@EdgeNode:/usr/local/sqoop/sqoop_work# export HADOOP_OPTS=-Djava.security.egd=file:/dev/../dev/urandom
root@EdgeNode:/usr/local/sqoop/sqoop_work# sqoop import -D mapred.child.java.opts="\-Djava.security.egd=file:/dev/../dev/urandom" \
--bindir ./ \
--connect "jdbc:oracle:thin:@10.0.100.4:1521/XE" \
--username "HR" \
--password "hrpass1234" \
--num-mappers 1 \
--as-textfile \
--target-dir /user/hive/warehouse/job_history \
--fields-terminated-by '|' \
--lines-terminated-by '\n' \
--table HR.JOB_HISTORY \
--hive-import \
--hive-table job_history

This will create the job_history table in the default hive database as in hive-site.xml. Now let us check the hive table.

root@EdgeNode:/usr/local/sqoop/sqoop_work# cd $HIVE_HOME/hive_work
root@EdgeNode:/usr/local/hive/hive_work# hive
hive> SHOW DATABASES;
hive> USE default;
hive> select count(1) from job_history;
hive> select * from job_history;
hive> quit;

Note that the delimited data files in HDFS backing the hive table, will be under the warehouse directory as configured in hive-site.xml

root@EdgeNode:/usr/local/hive/hive_work# cd $HADOOP_HOME/bin
root@EdgeNode:/usr/local/hadoop/bin# hadoop fs –ls /user/hive/warehouse/job_history

SQOOP Import Database

Finally, just one more sqoop utility we will check at this point is import-all-tables. Cases we might be interested to bring an entire database schema to HDFS, we can use import-all-tables utility.

root@EdgeNode:/usr/local/hadoop/bin# cd $SQOOP_HOME/sqoop_work
root@EdgeNode:/usr/local/sqoop/sqoop_work# export HADOOP_OPTS=-Djava.security.egd=file:/dev/../dev/urandom
root@EdgeNode:/usr/local/sqoop/sqoop_work# sqoop import-all-tables -D mapred.child.java.opts="\-Djava.security.egd=file:/dev/../dev/urandom" \
--bindir ./ \
--connect "jdbc:oracle:thin:@10.0.100.4:1521/XE" \
--username "HR" \
--password "hrpass1234" \
--num-mappers 1 \
--exclude-tables "HR.JOB","HR.JOB_HISTORY" \
--as-textfile \
--direct \
--fields-terminated-by '|' \
--lines-terminated-by '\n' \
--hive-import

Now let us check the hive tables:

root@EdgeNode:/usr/local/sqoop/sqoop_work# cd $HIVE_HOME/hive_work
root@EdgeNode:/usr/local/hive/hive_work# hive
hive> SHOW DATABASES;
hive> USE default;
hive> show tables;
hive> select count(1) from inventory;
hive> select * from inventory limit 10;
hive> quit;

In our next article we will check another sqoop utility called Merge. We will perform an incremental extraction from an Oracle source table followed by Merge.

SQOOP import from Oracle

SQOOP Import to HDFS

Hive Table

SQOOP import to HIVE

SQOOP Import Database

PrimeChess.org

Top 10 Articles