Logo DWBI.org Login / Sign Up
Sign Up
Have Login?
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Login
New Account?
Recovery
Go to Login
By continuing you indicate that you agree to Terms of Service and Privacy Policy of the site.
Big Data

Hadoop DataLake Implementation Part 10

 
Updated on Oct 03, 2020

In this article we will create oozie workflow to orchestrate the daily loading of showroom dimension table from MySQL source to HDFS using Sqoop, followed by Loading data from HDFS to Hive warehouse using Hive and finally housekkeping & archive.

Setup Oozie Workflow

Login to Ambari Web UI, with user edw_user & password hadoop. Open Workflow Manager View to develop our Workflow to load the data lake on a daily basis.

1. Let us first create a daily load workflow for Showroom Dimension. Click on Create Workflow, and name it as Wf_Showroom.

  • Add a Sqoop Action Node, namely sq_stg_showroom in between the Start & End node. Next click on the Settings icon for the Sqoop Node.

On the General Tab, select option Send As to Args. Add five Args as –

job
--exec
jb_stg_showroom
--meta-connect
jdbc:hsqldb:hsql://sandbox-hdp.hortonworks.com:16001/sqoop

Next click on Settings and enter the Workflow Parameters configurations as below:

oozie.action.sharelib.for.sqoop=sqoop
  • Next add a Hive2 Action Node, namely hv_dim_showroom after the node sq_stg_showroom. Next click on the Settings icon for the Hive2 Node. On the General Tab, select Hive Option as Script.

Add the script file path as – /user/edw_user/sales/scripts/load_showroom.hql

Enter the jdbc-url as jdbc:hive2://sandbox-hdp.hortonworks.com:10000/sales_analytics edw_user

Enter password as hadoop

  • Next add a Shell Action Node, namely sh_arc_showroom after the node hv_dim_showroom. Next click on the settings icon for Shell Node.

Under General, exec enter script file path as - /user/edw_user/sales/scripts/archive_showroom.sh

Under Advanced Properties enter File as /user/edw_user/sales/scripts/archive_showroom.sh

Finally click on Save with Workflow Path as /user/edw_user/sales/workflows/showroom_workflow.xml, check the Overwrite option. Next click on Submit Job, check Overwrite & Run on submit, to test the workflow.

2. Now we will create a daily load workflow for Sales Fact. Click on Create and develop a Workflow, named Wf_Sales.

  • Add a Sqoop Action Node, namely sq_stg_sales in between the Start & End node. Next click on the Settings icon for the Sqoop Node.

On the General Tab, select option Send As to Args. Add five Args as –

job
--exec
jb_stg_showroom
--meta-connect
jdbc:hsqldb:hsql://sandbox-hdp.hortonworks.com:16001/sqoop

Next click on Settings and enter the Workflow Parameters configurations as below:

oozie.action.sharelib.for.sqoop=sqoop
  • Next add a Pig Action Node, namely pg_stg_sales in between the Start & End node. Next click on the Settings icon for the Sqoop Node.

On the General Tab, under Script browse and enter the file path as - /user/edw_user/sales/scripts/transform_sales.pig

Under Advanced Properties enter File as /user/edw_user/sales/scripts/transform_sales.sh

Under Configuration enter the following parameters:

hive.metastore.uris=thrift://sandbox-hdp.hortonworks.com:9083
mapreduce.map.output.compress=true

Next click on Settings and enter the Workflow Parameters configurations as below:

oozie.action.sharelib.for.sqoop=sqoop
oozie.action.sharelib.for.pig=pig,hive,hcatalog
  • Next add a Hive2 Action Node, namely hv_fact_sales after the node pq_stg_sales. Next click on the Settings icon for the Hive2 Node. On the General Tab, select Hive Option as Script.

Add the script file path as – /user/edw_user/sales/scripts/load_sales.hql

Enter the jdbc-url as jdbc:hive2://sandbox-hdp.hortonworks.com:10000/sales_analytics edw_user

Enter password as hadoop

  • Next add a Shell Action Node, namely sh_arc_sales after the node hv_fact_sales. Next click on the settings icon for Shell Node.

Under General, exec enter script file path as - /user/edw_user/sales/scripts/archive_sales.sh

Under Advanced Properties enter File as /user/edw_user/sales/scripts/archive_sales.sh

Finally click on Save with Workflow Path as /user/edw_user/sales/workflows/sales_workflow.xml, check the Overwrite option. Next click on Submit Job, check Overwrite & Run on submit, to test the workflow.

PrimeChess

PrimeChess.org

PrimeChess.org makes elite chess training accessible and affordable for everyone. For the past 6 years, we have offered free chess camps for kids in Singapore and India, and during that time, we also observed many average-rated coaches charging far too much for their services.

To change that, we assembled a team of top-rated coaches including International Masters (IM) or coaches with multiple IM or GM norms, to provide online classes starting from $50 per month (8 classes each month + 4 tournaments)

This affordability is only possible if we get more students. This is why it will be very helpful if you could please pass-on this message to others.

Exclucively For Indian Residents: 
Basic - ₹1500
Intermediate- ₹2000
Advanced - ₹2500

Top 10 Articles