Informatica

Informatica Join Vs Database Join

Saurav Mitra

In this yet another "DWBI Concepts' Original article", we test the performance of Informatica PowerCentre 8.5 Joiner transformation versus Oracle 10g database join. This article gives a crucial insight to application developer in order to take informed decision regarding performance tuning.

Which is the fastest? Informatica or Oracle?

Original Research

In our previous article, we tested the performance of ORDER BY operation in Informatica and Oracle and found that, in our test condition, Oracle performs sorting 14% speedier than Informatica. This time we will look into the JOIN operation, not only because JOIN is the single most important data set operation but also because performance of JOIN can give crucial data to a developer in order to develop proper push down optimization manually.

Informatica is one of the leading data integration tools in today’s world. More than 4,000 enterprises worldwide rely on Informatica to access, integrate and trust their information assets with it. On the other hand, Oracle database is arguably the most successful and powerful RDBMS system that is trusted from 1980s in all sorts of business domain and across all major platforms. Both of these systems are bests in the technologies that they support. But when it comes to the application development, developers often face challenge to strike the right balance of operational load sharing between these systems. This article will help them to take the informed decision.

Which JOINs data faster? Oracle or Informatica?

As an application developer, you have the choice of either using joining syntaxes in database level to join your data or using JOINER TRANSFORMATION in Informatica to achieve the same outcome. The question is – which system performs this faster?

Test Preparation

We will perform the same test with 4 different data points (data volumes) and log the results. We will start with 1 million data in detail table and 0.1 million in master table. Subsequently we will test with 2 million, 4 million and 6 million detail table data volumes and 0.2 million, 0.4 million and 0.6 million master table data volumes. Here are the details of the setup we will use,

  1. Oracle 10g database as relational source and target
  2. Informatica PowerCentre 8.5 as ETL tool
  3. Database and Informatica setup on different physical servers using HP UNIX
  4. Source database table has no constraint, no index, no database statistics and no partition
  5. Source database table is not available in Oracle shared pool before the same is read
  6. There is no session level partition in Informatica PowerCentre
  7. There is no parallel hint provided in extraction SQL query
  8. Informatica JOINER has enough cache size

We have used two sets of Informatica PowerCentre mappings created in Informatica PowerCentre designer. The first mapping m_db_side_join will use an INNER JOIN clause in the source qualifier to sort data in database level. Second mapping m_Infa_side_join will use an Informatica JOINER to JOIN data in informatica level. We have executed these mappings with different data points and logged the result.

Further to the above test we will execute m_db_side_join mapping once again, this time with proper database side indexes and statistics and log the results.

Result

The following graph shows the performance of Informatica and Database in terms of time taken by each system to sort data. The average time is plotted along vertical axis and data points are plotted along horizontal axis.

Data PointsMaster Table Record CountDetail Table Record Count
10.1 M1 M
20.2 M2 M
30.4 M4 M
40.6 M6 M
Database-vs-Informatica

Verdict

In our test environment, Oracle 10g performs JOIN operation 24% faster than Informatica Joiner Transformation while without Index and 42% faster with Database Index

Assumption

  1. Average server load remains same during all the experiments
  2. Average network speed remains same during all the experiments

1. This data can only be used for performance comparison but cannot be used for performance benchmarking.
2. This data is only indicative and may vary in different testing conditions.