Logo DWBI.org Login / Sign Up
Sign Up
Have Login?
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Login
New Account?
Recovery
Go to Login
By continuing you indicate that you agree to Terms of Service and Privacy Policy of the site.
Pentaho Data Integration

Handling XML source files in Pentaho Data Integration

Updated on Oct 03, 2020

This article will demonstrate how to read data from XML based source files using Pentaho Data Integration. In order to read the source XML based file we will be using the Get data from XML step. In this article we will read one simple XML file followed by complex nested hierarchical XML data file.

Let us take a quick look into our first sample XML file which has information regarding books in a bookstore.

<xml version="1.0" encoding="utf-8" standalone="yes"?>
<bookstore>
  <book category="children">
    <title>Harry Potter</title>
    <author>J K. Rowling</author>
    <year></year>2005</year>
    <price>29.99</price>
  </book>
  <book category="web">
    <title>Learning XML</title>
    <author>Erik T. Ray</author>
    <year>2003</year>
    <price>39.95</price>
  </book>
</bookstore>

Please check the self explanatory implementation screenshots as below:

Transformation
Transformation
Get data from XML
Get data from XML
Get data from XML - Content
Get data from XML - Content
Get data from XML - Fields
Get data from XML - Fields
Table output
Table output
Table output - Database fields
Table output - Database fields
Execution Results
Execution Results

Let us take a quick look into our next sample XML file which has information regarding sales order header as well sales order line items.

<?xml version="1.0" ?>
<HeaderLines>
  <SalesOrder>
    <Header Date="17-02-04" Type="Quote">
      <SellTo CountryRegion="GB">
       <Name>The Cannon Group PLC</Name>
       <Address>192 Market Square</Address>
       <City>Birmingham</City>
       <Zip>B27 4KT</Zip>
     </SellTo>
     <BillTo CountryRegion="GB">
       <Name>The Cannon Group PLC</Name>
       <Address>192 Market Square</Address>
       <City>Birmingham</City>
       <Zip>B27 4KT</Zip>
     </BillTo>
     <Lines>
       <Item PartNum="LS-150">
         <ProductName>Loudspeaker, Cherry, 150W</ProductName>
         <Quantity>8</Quantity>
         <UnitPrice>129,00</UnitPrice>
         <ShipmentDate />
         <Comment>Confirm the voltage is 75W</Comment>
       </Item>
       <Item PartNum="LS-MAN-10">
          <ProductName>Manual for Loudspeakers</ProductName>
          <Quantity>20</Quantity>
          <UnitPrice />
          <ShipmentDate />
          <Comment />
       </Item>
       <Item PartNum="LS-2">
          <ProductName>Cables for Loudspeakers</ProductName>
          <Quantity>10</Quantity>
          <UnitPrice>21,00</UnitPrice>
          <ShipmentDate />
          <Comment />
        </Item>
     </Lines>
     <Contact>Mr. Andy Toal</Contact>
     <Terms>14 days</Terms>
    </Header>
  </SalesOrder>
</HeaderLines>
Transformation
Transformation
XML Hdr Source
XML Hdr Source
XML Hdr Content
XML Hdr Content
XML Hdr Fields
XML Hdr Fields
Sort Hdr
Sort Hdr
XML Dtl Source
XML Dtl Source
XML Dtl Content
XML Dtl Content
XML Dtl Fields
XML Dtl Fields
Sort Dtl
Sort Dtl
Merge Join
Merge Join
Select values
Select values
Table output Hdr
Table output Hdr
Table output Dtl
Table output Dtl
Execution Results
Execution Results