Data Integration

Pentaho Data Integration

This section contains a number of articles and tutorials on Pentaho Data Integration™ popularly known as Kettle ETL tool.

Pages

Updated on May 14, 2019

Getting Started Pentaho Data Integration

Pentaho Data Integration or Spoon is a GUI workspace that allows us to create jobs and transformations to extracts data from heterogeneous sources, transforms that data using built-in steps and functions to meet business requirements and then can load the data into a wide variety of target databases or file systems for further analysis.

Updated on May 14, 2019

Using Pentaho Row Normaliser step

In this article we will learn how to use Pentaho Data Integration Row Normalizer Step. Normalizer step is used to normalize data from pivot or denormalized table. It allows us to change how the relationship between rows is displayed. For each value in each pivot column, Row Normalizer produces a row in the output data set. We can create new field sets to specify more than one pivot columns. It basically convert Columns to Rows.

Updated on May 14, 2019

Using Pentaho Row Denormaliser step

In this article we will learn how to use Pentaho Data Integration Row Denormaliser Step. Denormaliser step is used to denormalize or pivot data from normalized dataset or table. The Denormaliser step combines data from several rows into one row by creating new Target columns/fields. For each unique values for the Key field and each selected Group field, this step produces a target column in the output data set. It basically creates one row of data from several existing rows, i.e. to Pivot Rows to Columns.

Updated on May 14, 2019

Handling XML source files in Pentaho Data Integration

This article will demonstrate how to read data from XML based source files using Pentaho Data Integration. In order to read the source XML based file we will be using the Get data from XML step. In this article we will read one simple XML file followed by complex nested hierarchical XML data file.

Updated on May 14, 2019

Handling JSON source files in Pentaho Data Integration

This article will demonstrate how to read data from JSON based source files using Pentaho Data Integration. In order to read the source JSON based file we will be using the Json Input step. In this article we will read one complex JSON file with nested hierarchical JSON array.

Updated on May 14, 2019

Using Pentaho Http Client step

In this article we will learn how to use Pentaho Data Integration HTTP Client Step. Http Client step is used to send HTTP Requests to web URL's along with optional parameters/arguments and in turn receives the HTTP Response result from the the web methods.

Updated on May 14, 2019

XML file generation using Pentaho Data Integration

This article will demonstrate how to write data to XML files using Pentaho Data Integration. In order to write to XML file we will be using the XML Output step. This XML Output step allows us to write rows from any source to one or more XML files.

Updated on May 14, 2019

DW Implementation Using PDI

Data integration is the process by which information from multiple databases is consolidated for use in a single application. ETL (extract, transform, and load) is the most common form of DI found in data warehousing. In this article we will look into the PDI architecture & product features which makes it a perfect ETL tool to fit in a typical DM/DW landscape. This article is meant primarily for Technical Architects, Consultants & Developers who are evaluating PDI capabilities.

Updated on May 14, 2019

PDI Transformation Steps

As a continuation of the previous article, let us check some of the key PDI Steps. Below are few examples of in-built PDI transformation steps used widely in a DI/DW/DM landscape

Updated on May 14, 2019
© DWBI