Similar to what we discussed regarding the Performance Tuning of Joiner Transformation, the basic rule for tuning aggregator is to avoid aggregator transformation altogether unless...

  • You really can not do the aggregation in the source qualifier SQL query (e.g. Flat File source)
  • Fields used for aggregation are derived inside the mapping

Tuning Aggregator Transformation

If you have to do the aggregation using Informatica aggregator, then ensure that all the columns used in the group by are sorted in the same order of group by and “Sorted Input” option is checked in the aggregator properties. Ensuring the input data is sorted is absolutely must in order to achieve better performance and we will soon know why.

Other things that need to be checked to increase aggregator performance are –

  • Check if “Case-Sensitive String Comparison” option is really required. Keeping this option checked (default) slows down the aggregator performance
  • Enough memory (RAM) is available to do the in memory aggregation. See below section for details.
  • Aggregator cache is partitioned

How to (and when to) set aggregator Data and Index cache size

As I mentioned before also, my advice is to leave the “Aggregator Data Cache Size” and “Aggregator Index Cache Size” options as Auto (default) in the transformation level and if required, set either of the followings in the session level (under “Config Object” tab) to allow Informatica allocate enough memory automatically for the transformation:

  1. Maximum Memory Allowed For Auto Memory Attributes
  2. Maximum Percentage of Total Memory Allowed For Auto Memory Attributes

However if you do have to set Data Cache/ Index Cache size yourself, please note that the value you set here is actually RAM memory requirement (and not disk space requirement) and hence, your mapping will fail if Informatica can not allocate the entire memory in RAM at the session initiation. And yes, this can happen often because you never know what other jobs are running in the server and what amount of RAM other jobs are really occupying while you run this job.

Having understood the risk, let’s now see the benefit of manually configuring the Index and Data Cache sizes. If you leave the index and data cache sizes to auto then if Informatica does not get enough memory during session run time, your job will not fail, instead Informatica will page-out the data to hard disk level. Since I/O performance of hard disk drive is 1000~ times slower than RAM, paging out to hard disk drive will have performance penalty. So by setting data and index cache size manually you can ensure that Informatica block this memory in the beginning of session run so that the cache is not paged-out to disk and the entire aggregation actually take place in RAM. Do this at your own risk.

Manually configuring index and data cache sizes can be beneficial if ensuring consistent session performance is your highest priority compared to session stability and operational steadiness. Basically you risk your operations (since it creates high chance of session failure) to obtain optimized performance.

The best way to determine the data and index cache size(s) is to check the session log of already executed session. Session log clearly shows these sizes in bytes. But this size depends on the row count. So keep some buffer (around 20% in most cases) on top of these sizes and use those values for the configuration.

Other way to determine Index and Data Cache sizes are, of course, to use the inbuilt Cache-size calculator accessible in session level.

Aggregator Cache Size Calculator

Using the Informatica Aggregator cache size calculator is a bit difficult (and lot inaccurate). The reason is to calculate cache size properly you will need to know the number of groups that the aggregator is going to process. The definition of number of groups is as below:

No. of Groups = Multiplication of cardinality values of each group by column

This means, suppose you group by store and product, and there are total 150 distinct stores and 10 distinct products, then no. of groups will be 150 X 10 = 1500.

This is inaccurate because, in most cases you can not ascertain how many distinct stores and product data will come on each load. You might have 150 stores and 10 products, but there is no guarantee that all the product will come on all the load. Hence the cache size you determine in this method is quite approximate.

You can, however, calculate the cache size in both the two methods discussed here and take the max of the values to be in safer side.

Next, read How to optimize the performance of Informatica Lookup transformations >>

Have a question on this subject?

Ask questions to our expert community members and clear your doubts. Asking question or engaging in technical discussion is both easy and rewarding.

Are you on Twitter?

Start following us. This way we will always keep you updated with what's happening in Data Analytics community. We won't spam you. Promise.

  • Implementing SCD2 in Informatica Using ORA_HASH at Source

    In this article we shall see how we can implement SCD type2 in Informatica using ORA_HASH, which is an ORACLE function that computes hash value for a given expression. We can use this feature to find the existence of any change in any of the SCD...

  • APEAR - A tool for automating Informatica Performance Tuning

    DWBIConcepts is launching APEAR – an Automated Performance Evaluation and Reporting tool for Informatica. As the name suggests, this tool will help you tune the performance of Informatica sessions fully automatically. Now don't waste your precious...

  • Using Informatica Normalizer Transformation

    Normalizer transformation is a native transformation in Informatica that can ease many complex data transformation requirements. Learn how to effectively use normalizer in this tutorial.

  • Informatica Dynamic Lookup Cache

    A LookUp cache does not change its data once built. But what if the underlying table upon which lookup was done changes the data after the lookup cache is created? Is there a way so that the cache always remain up-to-date even if the underlying...

  • Informatica Tuning - Step by Step Approach

    This is the first of the number of articles on the series of Data Warehouse Application performance tuning scheduled to come every week. This one is on Informatica performance tuning.

  • Loading Flatfiles delimited by comma and double quotes

    In this article let us take up a very trivial but an important aspect that we as DW developers usual face. This is related to loading flat file sources. Whenever we have flat file sources we usual ask source systems for a specific type of field...

  • Pushdown Optimization In Informatica

    Pushdown Optimization which is a new concept in Informatica PowerCentre, allows developers to balance data transformation load among servers. This article describes pushdown techniques.

  • Calling C executable from Java Transform

    We are going to do is, to call C++ Executable from Informatica, using Passive Java Transform and capture the output of the C++ using Java and write the result to corresponding target column.

  • How to Tune Performance of Informatica Joiner Transformation

    Joiner transformation allows you to join two heterogeneous sources in the Informatica mapping. You can use this transformation to perform INNER and OUTER joins between two input streams. For performance reasons, I recommend you ONLY use JOINER...

  • The benefit and disadvantage of Informatica Persistent Cache Lookup

    Persistent cache may be your choice of caching when it comes to lookup performance. But you should be aware of the hazards of persistent cache as well.