Logo DWBI.org Login / Sign Up
Sign Up
Have Login?
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Login
New Account?
Recovery
Go to Login
By continuing you indicate that you agree to Terms of Service and Privacy Policy of the site.
Pentaho Data Integration

Using Pentaho Row Normaliser step

 
Updated on Oct 03, 2020

In this article we will learn how to use Pentaho Data Integration Row Normalizer Step. Normalizer step is used to normalize data from pivot or denormalized table. It allows us to change how the relationship between rows is displayed. For each value in each pivot column, Row Normalizer produces a row in the output data set. We can create new field sets to specify more than one pivot columns. It basically convert Columns to Rows.

Let us consider we have source data file as Excel sales budget for different stores based on year & quarters. We may want to transform the sales budget data based on the quarters for each of the stores.

Let's consider we have the following data in excel as yearly & quarterly sales budget:

STOREYEARQUARTER1QUARTER2QUARTER3QUARTER4
Clementi201650000700006500085000
River Valley201630000290003300031000
Clementi201755000750007000088000

So in this scenario we can use the Row Normaliser step to normalize the pivot data and will further load into Oracle database. Let's take a look at the expected normalized dataset below:

STOREYEARQUARTERBUDGET
Clementi2016Q150000
Clementi2016Q270000
Clementi2016Q365000
Clementi2016Q485000
River Valley2016Q130000
River Valley2016Q229000
River Valley2016Q333000
River Valley2016Q431000
Clementi2017Q155000
Clementi2017Q275000
Clementi2017Q370000
Clementi2017Q488000

Below are the steps we will be covering in this use case scenario.

  • Read excel data file, using Microsoft Excel Input
  • Normalize pivot data, using Row Normalizer
  • Write normalized data to Oracle database table, using Table output

Find below the use case transformation:

Transformation
Transformation

First in order to read excel file we will be using Microsoft excel input. Let's name the step as XL_Budget. Select spread sheet type as Excel 2007 XLSX Apache POI. Browse and locate the input excel file and click the Add button.

Microsoft Excel input
Microsoft Excel input

Next go to the Sheets tab and click on Get sheetname(s)... Select the correct data sheet. In my case it is Sheet1. In the contents tab check the Header option, as we have a header record in our excel sheet.

Microsoft Excel input Sheets
Microsoft Excel input Sheets

In the Fields tab, click the button Get fields from header row. Check and modify the Fields name & data type. Preview your records to confirm that PDI is able to read the excel file content as expected.

Microsoft Excel input Fields
Microsoft Excel input Fields

Next we are using a Row Normaliser step. Select the Type field as the QUARTER. This will be Output field name. Next in the Fields section map the Source fields namely Quarter1,2,3 & 4. Let's put the Type as Q1, Q2, Q3 & Q4. So the QUARTER output field will have these types as the output values. Let's name the corresponding new fields as BUDGET. This field will give the amount figures corresponding to the quarter types.

Row Normaliser
Row Normaliser

In order to write the normalized data into relational database, in our case Oracle RDBMS, we will use Table output step. Lets name this step as STG_BUDGET. At first we need to create the connection object. Click New or Wizard and create a connection.

Let's name the connection as Con_Staging. Select Connection type as Oracle & Driver as Native (JDBC). The Oracle jdbc driver might not be available to PDI. In that case you have to download oracle jdbc driver ojdbc6.jar and place in the PDI lib directory, i.e. data-integration/lib/ . Next define the hostname, database name, port, username and password to connect to your oracle instance. Test to check the connection and click OK.

Database connection
Database connection

Once the database connection is ready, browse the relevant target schema. Let's keep the target table name as STG_BUDGET. Select Truncate table option.

Table output
Table output

Next select Specify database fields and click on Get fields button under the Database fields tab. This will list the fields available in the prior Step. Next click on SQL. This will show the required DDL to create the table in the Target schema. Once verified okay, click on Execute to create the table directly in database. Note the database user for this connection object should have DDL create grant. That's all. So the target RDBMS table is ready to be loaded.

Table output - Database fields
Table output - Database fields

Let us run the Transformation from Spoon and validate the output data in the table from the Preview tab of the Execution Results section.

Execution Results
Execution Results

So we have successfully completed excel read, normalize pivot data and RDBMS write. In the next article we will check the reverse scenario on data denormalization using Denormalizer.

PrimeChess

PrimeChess.org

PrimeChess.org makes elite chess training accessible and affordable for everyone. For the past 6 years, we have offered free chess camps for kids in Singapore and India, and during that time, we also observed many average-rated coaches charging far too much for their services.

To change that, we assembled a team of top-rated coaches including International Masters (IM) or coaches with multiple IM or GM norms, to provide online classes starting from $50 per month (8 classes each month + 4 tournaments)

This affordability is only possible if we get more students. This is why it will be very helpful if you could please pass-on this message to others.

Exclucively For Indian Residents: 
Basic - ₹1500
Intermediate- ₹2000
Advanced - ₹2500

Top 10 Articles