In this article we will learn how to use Pentaho Data Integration Row Normalizer Step. Normalizer step is used to normalize data from pivot or denormalized table. It allows us to change how the relationship between rows is displayed. For each value in each pivot column, Row Normalizer produces a row in the output data set. We can create new field sets to specify more than one pivot columns. It basically convert Columns to Rows.

Let us consider we have source data file as Excel sales budget for different stores based on year & quarters. We may want to transform the sales budget data based on the quarters for each of the stores.

Let's consider we have the following data in excel as yearly & quarterly sales budget:

STOREYEARQUARTER1QUARTER2QUARTER3QUARTER4
Clementi201650000700006500085000
River Valley201630000290003300031000
Clementi201755000750007000088000

So in this scenario we can use the Row Normaliser step to normalize the pivot data and will further load into Oracle database. Let's take a look at the expected normalized dataset below:

STOREYEARQUARTERBUDGET
Clementi2016Q150000
Clementi2016Q270000
River Valley2016Q130000
Clementi2017Q155000

Below are the steps we will be covering in this use case scenario.

  • Read excel data file, usingMicrosoft Excel Input
  • Normalize pivot data, using Row Normalizer
  • Write normalized data to Oracle database table, using Table output

Find below the use case transformation:

Transformation

First in order to read excel file we will be using Microsoft excel input. Lets name the step as XL_Budget. Select spread sheet type as Excel 2007 XLSX Apache POI. Browse and locate the input excel file and click the Add button.

Excel files

Next go to the Sheets tab and click on Get sheetnames. Select the correct data sheet. In my case it is Sheet1. In the contents tab check the Header option, as we have a header record in our excel sheet.

Excel sheets

In the Fields tab, click the button Get fields from header row. Check and modify the Fields name & data type. Preview your records to confirm that PDI is able to read the excel file content as expected.

Excel fields

Next we are using a Row Normaliser step. Select the Type field as the QUARTER. This will be Output field name. Next in the Fields section map the Source fields namely Quarter1,2,3 & 4. Lets put the Type as Q1, Q2, Q3 & Q4. So the QUARTER output field will have these types as the output values. Lets name the corresponding new fields as BUDGET. This field will give the amount figures corresponding to the quarter types.

Row Normaliser

In order to write the normalized data into relational database, in our case Oracle RDBMS, we will use Table output step. Lets name this step as STG_BUDGET. At first we need to create the connection object. Click New or Wizard and create a connection.

Lets name the connection as Con_Staging. Select Connection type as Oracle & Driver as Native (JDBC). The Oracle jdbc driver might not be available to PDI. In that case you have to download oracle jdbc driver ojdbc6.jar and place in the PDI lib directory, i.e. data-integration/lib/ . Next define the hostname, database name, port, username and password to connect to your oracle instance. Test to check the connection and click OK.

Database connection

Once the database connection is ready, browse the relevant target schema. Lets keep the target table name as STG_BUDGET. Select Truncate table option.

Table output

Next select Specify database fields and click on Get fields button under the Database fields tab. This will list the fields available in the prior Step. Next click on SQL. This will show the required DDL to create the table in the Target schema. Once verified okay, click on Execute to create the table directly in database. Note the database user for this connection object should have DDL create grant. That's all. So the target RDBMS table is ready to be loaded.

Table fields

Let us run the Transformation from Spoon and validate the output data in the table from the Preview tab of the Execution Results section.

Execution results

So we have successfully completed excel read, normalize pivot data and RDBMS write. In the next article we will check the reverse scenario on data denormalization using Denormalizer