Hadoop DataLake Implementation Part 10
In this article we will create oozie workflow to orchestrate the daily loading of showroom dimension table from MySQL source to HDFS using Sqoop, followed by Loading data from HDFS to Hive warehouse using Hive and finally housekkeping & archive.
Setup Oozie Workflow
Login to Ambari Web UI, with user edw_user & password hadoop. Open Workflow Manager View to develop our Workflow to load the data lake on a daily basis.
1. Let us first create a daily load workflow for Showroom Dimension. Click on Create Workflow, and name it as Wf_Showroom.
- Add a Sqoop Action Node, namely sq_stg_showroom in between the Start & End node. Next click on the Settings icon for the Sqoop Node.
On the General Tab, select option Send As to Args. Add five Args as –
job
--exec
jb_stg_showroom
--meta-connect
jdbc:hsqldb:hsql://sandbox-hdp.hortonworks.com:16001/sqoop
Next click on Settings and enter the Workflow Parameters configurations as below:
oozie.action.sharelib.for.sqoop=sqoop
- Next add a Hive2 Action Node, namely hv_dim_showroom after the node sq_stg_showroom. Next click on the Settings icon for the Hive2 Node. On the General Tab, select Hive Option as Script.
Add the script file path as – /user/edw_user/sales/scripts/load_showroom.hql
Enter the jdbc-url as jdbc:hive2://sandbox-hdp.hortonworks.com:10000/sales_analytics edw_user
Enter password as hadoop
- Next add a Shell Action Node, namely sh_arc_showroom after the node hv_dim_showroom. Next click on the settings icon for Shell Node.
Under General, exec enter script file path as - /user/edw_user/sales/scripts/archive_showroom.sh
Under Advanced Properties enter File as /user/edw_user/sales/scripts/archive_showroom.sh
Finally click on Save with Workflow Path as /user/edw_user/sales/workflows/showroom_workflow.xml, check the Overwrite option. Next click on Submit Job, check Overwrite & Run on submit, to test the workflow.
2. Now we will create a daily load workflow for Sales Fact. Click on Create and develop a Workflow, named Wf_Sales.
- Add a Sqoop Action Node, namely sq_stg_sales in between the Start & End node. Next click on the Settings icon for the Sqoop Node.
On the General Tab, select option Send As to Args. Add five Args as –
job
--exec
jb_stg_showroom
--meta-connect
jdbc:hsqldb:hsql://sandbox-hdp.hortonworks.com:16001/sqoop
Next click on Settings and enter the Workflow Parameters configurations as below:
oozie.action.sharelib.for.sqoop=sqoop
- Next add a Pig Action Node, namely pg_stg_sales in between the Start & End node. Next click on the Settings icon for the Sqoop Node.
On the General Tab, under Script browse and enter the file path as - /user/edw_user/sales/scripts/transform_sales.pig
Under Advanced Properties enter File as /user/edw_user/sales/scripts/transform_sales.sh
Under Configuration enter the following parameters:
hive.metastore.uris=thrift://sandbox-hdp.hortonworks.com:9083
mapreduce.map.output.compress=true
Next click on Settings and enter the Workflow Parameters configurations as below:
oozie.action.sharelib.for.sqoop=sqoop
oozie.action.sharelib.for.pig=pig,hive,hcatalog
- Next add a Hive2 Action Node, namely hv_fact_sales after the node pq_stg_sales. Next click on the Settings icon for the Hive2 Node. On the General Tab, select Hive Option as Script.
Add the script file path as – /user/edw_user/sales/scripts/load_sales.hql
Enter the jdbc-url as jdbc:hive2://sandbox-hdp.hortonworks.com:10000/sales_analytics edw_user
Enter password as hadoop
- Next add a Shell Action Node, namely sh_arc_sales after the node hv_fact_sales. Next click on the settings icon for Shell Node.
Under General, exec enter script file path as - /user/edw_user/sales/scripts/archive_sales.sh
Under Advanced Properties enter File as /user/edw_user/sales/scripts/archive_sales.sh
Finally click on Save with Workflow Path as /user/edw_user/sales/workflows/sales_workflow.xml, check the Overwrite option. Next click on Submit Job, check Overwrite & Run on submit, to test the workflow.