Similar to what we discussed regarding the Performance Tuning of Joiner Transformation, the basic rule for tuning aggregator is to avoid aggregator transformation altogether unless...

  • You really can not do the aggregation in the source qualifier SQL query (e.g. Flat File source)
  • Fields used for aggregation are derived inside the mapping

Tuning Aggregator Transformation

If you have to do the aggregation using Informatica aggregator, then ensure that all the columns used in the group by are sorted in the same order of group by and “Sorted Input” option is checked in the aggregator properties. Ensuring the input data is sorted is absolutely must in order to achieve better performance and we will soon know why.

Other things that need to be checked to increase aggregator performance are –

  • Check if “Case-Sensitive String Comparison” option is really required. Keeping this option checked (default) slows down the aggregator performance
  • Enough memory (RAM) is available to do the in memory aggregation. See below section for details.
  • Aggregator cache is partitioned

How to (and when to) set aggregator Data and Index cache size

As I mentioned before also, my advice is to leave the “Aggregator Data Cache Size” and “Aggregator Index Cache Size” options as Auto (default) in the transformation level and if required, set either of the followings in the session level (under “Config Object” tab) to allow Informatica allocate enough memory automatically for the transformation:

  1. Maximum Memory Allowed For Auto Memory Attributes
  2. Maximum Percentage of Total Memory Allowed For Auto Memory Attributes

However if you do have to set Data Cache/ Index Cache size yourself, please note that the value you set here is actually RAM memory requirement (and not disk space requirement) and hence, your mapping will fail if Informatica can not allocate the entire memory in RAM at the session initiation. And yes, this can happen often because you never know what other jobs are running in the server and what amount of RAM other jobs are really occupying while you run this job.

Having understood the risk, let’s now see the benefit of manually configuring the Index and Data Cache sizes. If you leave the index and data cache sizes to auto then if Informatica does not get enough memory during session run time, your job will not fail, instead Informatica will page-out the data to hard disk level. Since I/O performance of hard disk drive is 1000~ times slower than RAM, paging out to hard disk drive will have performance penalty. So by setting data and index cache size manually you can ensure that Informatica block this memory in the beginning of session run so that the cache is not paged-out to disk and the entire aggregation actually take place in RAM. Do this at your own risk.

Manually configuring index and data cache sizes can be beneficial if ensuring consistent session performance is your highest priority compared to session stability and operational steadiness. Basically you risk your operations (since it creates high chance of session failure) to obtain optimized performance.

The best way to determine the data and index cache size(s) is to check the session log of already executed session. Session log clearly shows these sizes in bytes. But this size depends on the row count. So keep some buffer (around 20% in most cases) on top of these sizes and use those values for the configuration.

Other way to determine Index and Data Cache sizes are, of course, to use the inbuilt Cache-size calculator accessible in session level.

Aggregator Cache Size Calculator

Using the Informatica Aggregator cache size calculator is a bit difficult (and lot inaccurate). The reason is to calculate cache size properly you will need to know the number of groups that the aggregator is going to process. The definition of number of groups is as below:

No. of Groups = Multiplication of cardinality values of each group by column

This means, suppose you group by store and product, and there are total 150 distinct stores and 10 distinct products, then no. of groups will be 150 X 10 = 1500.

This is inaccurate because, in most cases you can not ascertain how many distinct stores and product data will come on each load. You might have 150 stores and 10 products, but there is no guarantee that all the product will come on all the load. Hence the cache size you determine in this method is quite approximate.

You can, however, calculate the cache size in both the two methods discussed here and take the max of the values to be in safer side.

Next, read How to optimize the performance of Informatica Lookup transformations >>

Have a question on this subject?

Ask questions to our expert community members and clear your doubts. Asking question or engaging in technical discussion is both easy and rewarding.

Are you on Twitter?

Start following us. This way we will always keep you updated with what's happening in Data Analytics community. We won't spam you. Promise.

  • Implementing Informatica Incremental Aggregation

    Using incremental aggregation, we apply captured changes in the source data (CDC part) to aggregate calculations in a session. If the source changes incrementally and we can capture the changes, then we can configure the session to process those...

  • Informatica Tuning - Step by Step Approach

    This is the first of the number of articles on the series of Data Warehouse Application performance tuning scheduled to come every week. This one is on Informatica performance tuning.

  • Pushdown Optimization In Informatica

    Pushdown Optimization which is a new concept in Informatica PowerCentre, allows developers to balance data transformation load among servers. This article describes pushdown techniques.

  • How to Tune Performance of Informatica Lookup Transformation

    To me, look-up is the single most important (and difficult) transformation that we need to consider while tuning performance of Informatica jobs. The choice and use of correct type of Look-Up can dramatically vary the session performance in...

  • How to Tune Performance of Informatica Joiner Transformation

    Joiner transformation allows you to join two heterogeneous sources in the Informatica mapping. You can use this transformation to perform INNER and OUTER joins between two input streams. For performance reasons, I recommend you ONLY use JOINER...

  • How to get Informatica Repository Information from Metadata

    Some metadata views can be very handy to get Informatica Repository Information. Know How.

  • Using Informatica Stored Procedure Transformation

    Stored Procedure Transformation - as the name suggests is used to execute stored procedures through Informatica ETL. It can also be used to call functions to return calculated values. The Stored Procedures that are to be executed should be...

  • Implementing Informatica Partitions

    Identification and elimination of performance bottlenecks will obviously optimize session performance. After tuning all the mapping bottlenecks, we can further optimize session performance by increasing the number of pipeline partitions in the...

  • Informatica Excel Source

    This article is a guide on how to Unload data from EXCEL file system to target relational database using Informatica.

  • Calling C executable from Java Transform

    We are going to do is, to call C++ Executable from Informatica, using Passive Java Transform and capture the output of the C++ using Java and write the result to corresponding target column.