This article will demonstrate how to read data from XML based source files using Pentaho Data Integration. In order to read the source XML based file we will be using the Get data from XML step. In this article we will read one simple XML file followed by complex nested hierarchical XML data file.

Let us take a quick look into our first sample XML file which has information regarding books in a bookstore.

<xml version="1.0" encoding="utf-8" standalone="yes"?>
<bookstore>
  <book category="children">
    <title>Harry Potter</title>
    <author>J K. Rowling</author>
    <year>2005</year>
    <price>29.99</price>
  </book>
  <book category="web">
    <title>Learning XML</title>
    <author>Erik T. Ray</author>
    <year>2003</year>
    <price>39.95</price>
  </book>
</bookstore>
To do

Transformation

XML Source

XML Content

XML Fields

Table output

Table fields

Execution results

Let us take a quick look into our next sample XML file which has information regarding sales order header as well sales order line items.

<?xml version="1.0" ?>
<HeaderLines>
  <SalesOrder>
    <Header Date="17-02-04" Type="Quote">
      <SellTo CountryRegion="GB">
       <Name>The Cannon Group PLC</Name>
       <Address>192 Market Square</Address>
       <City>Birmingham</City>
       <Zip>B27 4KT</Zip>
     </SellTo>
     <BillTo CountryRegion="GB">
       <Name>The Cannon Group PLC</Name>
       <Address>192 Market Square</Address>
       <City>Birmingham</City>
       <Zip>B27 4KT</Zip>
     </BillTo>
     <Lines>
       <Item PartNum="LS-150">
         <ProductName>Loudspeaker, Cherry, 150W</ProductName>
         <Quantity>8</Quantity>
         <UnitPrice>129,00</UnitPrice>
         <ShipmentDate />
         <Comment>Confirm the voltage is 75W</Comment>
       </Item>
       <Item PartNum="LS-MAN-10">
          <ProductName>Manual for Loudspeakers</ProductName>
          <Quantity>20</Quantity>
          <UnitPrice />
          <ShipmentDate />
          <Comment />
       </Item>
       <Item PartNum="LS-2">
          <ProductName>Cables for Loudspeakers</ProductName>
          <Quantity>10</Quantity>
          <UnitPrice>21,00</UnitPrice>
          <ShipmentDate />
          <Comment />
        </Item>
     </Lines>
     <Contact>Mr. Andy Toal</Contact>
     <Terms>14 days</Terms>
    </Header>
  </SalesOrder>
</HeaderLines>

Transformation Nested

XML Hdr Source

XML Hdr Content

XML Hdr Fields

Sort Hdr

XML Dtl Source

XML Dtl Content

XML Dtl Fields

Sort Dtl

Merge Join

Select values

Nested XML Table output

Nested XML Table fields

Nested XML Execution results