Comparing Performance of SORT operation (Order By) in Informatica and Oracle
In this "DWBI Concepts' Original article", we put Oracle database and Informatica PowerCentre to lock horns to prove which one of them handles data SORTing operation faster. This article gives a crucial insight to application developer in order to take informed decision regarding performance tuning.
Which is the fastest? Informatica or Oracle?
Informatica is one of the leading data integration tools in today’s world. More than 4,000 enterprises worldwide rely on Informatica to access, integrate and trust their information assets with it. On the other hand, Oracle database is arguably the most successful and powerful RDBMS system that is trusted from 1980s in all sorts of business domain and across all major platforms. Both of these systems are bests in the technologies that they support. But when it comes to the application development, developers often face challenge to strike the right balance of operational load sharing between these systems.
Think about a typical ETL operation often used in enterprise level data integration. A lot of data processing can be either redirected to the database or to the ETL tool. In general, both the database and the ETL tool are reasonably capable of doing such operations with almost same efficiency and capability. But in order to achieve the optimized performance, a developer must carefully consider and decide which system s/he should be trusting with for each individual processing task.
In this article, we will take a basic database operation – Sorting, and we will put these two systems to test in order to determine which does it faster than the other, if at all.
Which sorts data faster? Oracle or Informatica?
As an application developer, you have the choice of either using ORDER BY in database level to sort your data or using SORTER TRANSFORMATION in Informatica to achieve the same outcome. The question is – which system performs this faster?
We will perform the same test with different data points (data volumes) and log the results. We will start with 1 million records and we will be doubling the volume for each next data points. Here are the details of the setup we will use,
- Oracle 10g database as relational source and target
- Informatica PowerCentre 8.5 as ETL tool
- Database and Informatica setup on different physical servers using HP UNIX
- Source database table has no constraint, no index, no database statistics and no partition
- Source database table is not available in Oracle shared pool before the same is read
- There is no session level partition in Informatica PowerCentre
- There is no parallel hint provided in extraction SQL query
- The source table has 10 columns and first 8 columns will be used for sorting
- Informatica sorter has enough cache size
We have used two sets of Informatica PowerCentre mappings created in Informatica PowerCentre designer. The first mapping m_db_side_sort will use an ORDER BY clause in the source qualifier to sort data in database level. Second mapping m_Infa_side_sort will use an Informatica sorter to sort data in informatica level. We have executed these mappings with different data points and logged the result.
The following graph shows the performance of Informatica and Database in terms of time taken by each system to sort data. The time is plotted along vertical axis and data volume is plotted along horizontal axis.
The above experiment demonstrates that Oracle database is faster in SORT operation than Informatica by an average factor of 14%.
- Average server load remains same during all the experiments
- Average network speed remains same during all the experiments
- This data can only be used for performance comparison but cannot be used for performance benchmarking.
- This data is only indicative and may vary in different testing conditions.
To know the Informatica and Oracle performance comparison for JOIN operation, please click here