In this yet another "DWBI Concepts' Original article", we test the performance of Informatica PowerCentre 8.5 Joiner transformation versus Oracle 10g database join. This article gives a crucial insight to application developer in order to take informed decision regarding performance tuning.

Which is the fastest? Informatica or Oracle?

Original Research In our previous article, we tested the performance of ORDER BY operation in Informatica and Oracle and found that, in our test condition, Oracle performs sorting 14% speedier than Informatica. This time we will look into the JOIN operation, not only because JOIN is the single most important data set operation but also because performance of JOIN can give crucial data to a developer in order to develop proper push down optimization manually.

Informatica is one of the leading data integration tools in today’s world. More than 4,000 enterprises worldwide rely on Informatica to access, integrate and trust their information assets with it. On the other hand, Oracle database is arguably the most successful and powerful RDBMS system that is trusted from 1980s in all sorts of business domain and across all major platforms. Both of these systems are bests in the technologies that they support. But when it comes to the application development, developers often face challenge to strike the right balance of operational load sharing between these systems. This article will help them to take the informed decision.

Which JOINs data faster? Oracle or Informatica?

As an application developer, you have the choice of either using joining syntaxes in database level to join your data or using JOINER TRANSFORMATION in Informatica to achieve the same outcome. The question is – which system performs this faster?

Test Preparation

We will perform the same test with 4 different data points (data volumes) and log the results. We will start with 1 million data in detail table and 0.1 million in master table. Subsequently we will test with 2 million, 4 million and 6 million detail table data volumes and 0.2 million, 0.4 million and 0.6 million master table data volumes. Here are the details of the setup we will use,

  1. Oracle 10g database as relational source and target
  2. Informatica PowerCentre 8.5 as ETL tool
  3. Database and Informatica setup on different physical servers using HP UNIX
  4. Source database table has no constraint, no index, no database statistics and no partition
  5. Source database table is not available in Oracle shared pool before the same is read
  6. There is no session level partition in Informatica PowerCentre
  7. There is no parallel hint provided in extraction SQL query
  8. Informatica JOINER has enough cache size

We have used two sets of Informatica PowerCentre mappings created in Informatica PowerCentre designer. The first mapping m_db_side_join will use an INNER JOIN clause in the source qualifier to sort data in database level. Second mapping m_Infa_side_join will use an Informatica JOINER to JOIN data in informatica level. We have executed these mappings with different data points and logged the result.

Further to the above test we will execute m_db_side_join mapping once again, this time with proper database side indexes and statistics and log the results.

Result

The following graph shows the performance of Informatica and Database in terms of time taken by each system to sort data. The average time is plotted along vertical axis and data points are plotted along horizontal axis.

Data PointsMaster Table Record CountDetail Table Record Count
10.1 M1 M
20.2 M2 M
30.4 M4 M
40.6 M6 M

Database-vs-Informatica

Verdict

In our test environment, Oracle 10g performs JOIN operation 24% faster than Informatica Joiner Transformation while without Index and 42% faster with Database Index

Assumption

  1. Average server load remains same during all the experiments
  2. Average network speed remains same during all the experiments

1. This data can only be used for performance comparison but cannot be used for performance benchmarking.
2. This data is only indicative and may vary in different testing conditions.


Have a question on this subject?

Ask questions to our expert community members and clear your doubts. Asking question or engaging in technical discussion is both easy and rewarding.

Are you on Twitter?

Start following us. This way we will always keep you updated with what's happening in Data Analytics community. We won't spam you. Promise.

  • Working with Informatica Flatfiles

    In this article series we will try to cover all the possible scenarios related to flatfiles in Informatica.

  • PowerCenter SOA Components

    PowerCenter has a Service-Oriented Architecture that provides the ability to scale services and share resources across multiple machines. Let us know more about the components and services associated with Powercenter.

  • Using Informatica Normalizer Transformation

    Normalizer transformation is a native transformation in Informatica that can ease many complex data transformation requirements. Learn how to effectively use normalizer in this tutorial.

  • Pushdown Optimization In Informatica

    Pushdown Optimization which is a new concept in Informatica PowerCentre, allows developers to balance data transformation load among servers. This article describes pushdown techniques.

  • How to Tune Performance of Informatica Joiner Transformation

    Joiner transformation allows you to join two heterogeneous sources in the Informatica mapping. You can use this transformation to perform INNER and OUTER joins between two input streams. For performance reasons, I recommend you ONLY use JOINER...

  • Informatica Performance Tuning - Complete Guide

    This article is a comprehensive guide to the techniques and methodologies available for tuning the performance of Informatica PowerCentre ETL tool. It's a one stop performance tuning manual for Informatica.

  • Implementing Informatica Persistent Cache

    You must have noticed that the "time" Informatica takes to build the lookup cache can be too much sometimes depending on the lookup table size/volume. Using Persistent Cache, you may save lot of your time. This article describes how to do it.

  • Aggregation with out Informatica Aggregator

    Since Informatica process data on row by row basis, it is generally possible to handle data aggregation operation even without an Aggregator Transformation. On certain cases, you may get huge performance gain using this technique!

  • Challenges of Informatica Partitioning with Sequencing

    In the previous article, we showed how surrogate keys can be generated without using Sequence Generator transformation. However, if Informatica partitioning is implemented for such cases, then since each partition pipeline will call the lookup...

  • CDC Implementation using Flatfile

    This article shows how to use a flatfile to implement Change data Capture. Suppose we want to maintain the last extraction date in a flatfile, based on that value we want to capture the changed data of our business table.