Handling XML source files in Pentaho Data Integration
This article will demonstrate how to read data from XML based source files using Pentaho Data Integration. In order to read the source XML based file we will be using the Get data from XML step. In this article we will read one simple XML file followed by complex nested hierarchical XML data file.
Let us take a quick look into our first sample XML file which has information regarding books in a bookstore.
<xml version="1.0" encoding="utf-8" standalone="yes"?>
<bookstore>
<book category="children">
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year></year>2005</year>
<price>29.99</price>
</book>
<book category="web">
<title>Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
Please check the self explanatory implementation screenshots as below:
Let us take a quick look into our next sample XML file which has information regarding sales order header as well sales order line items.
<?xml version="1.0" ?>
<HeaderLines>
<SalesOrder>
<Header Date="17-02-04" Type="Quote">
<SellTo CountryRegion="GB">
<Name>The Cannon Group PLC</Name>
<Address>192 Market Square</Address>
<City>Birmingham</City>
<Zip>B27 4KT</Zip>
</SellTo>
<BillTo CountryRegion="GB">
<Name>The Cannon Group PLC</Name>
<Address>192 Market Square</Address>
<City>Birmingham</City>
<Zip>B27 4KT</Zip>
</BillTo>
<Lines>
<Item PartNum="LS-150">
<ProductName>Loudspeaker, Cherry, 150W</ProductName>
<Quantity>8</Quantity>
<UnitPrice>129,00</UnitPrice>
<ShipmentDate />
<Comment>Confirm the voltage is 75W</Comment>
</Item>
<Item PartNum="LS-MAN-10">
<ProductName>Manual for Loudspeakers</ProductName>
<Quantity>20</Quantity>
<UnitPrice />
<ShipmentDate />
<Comment />
</Item>
<Item PartNum="LS-2">
<ProductName>Cables for Loudspeakers</ProductName>
<Quantity>10</Quantity>
<UnitPrice>21,00</UnitPrice>
<ShipmentDate />
<Comment />
</Item>
</Lines>
<Contact>Mr. Andy Toal</Contact>
<Terms>14 days</Terms>
</Header>
</SalesOrder>
</HeaderLines>