Using incremental aggregation, we apply captured changes in the source data (CDC part) to aggregate calculations in a session. If the source changes incrementally and we can capture the changes, then we can configure the session to process those changes. This allows the Integration Service to update the target incrementally, rather than forcing it to delete previous loads data, process the entire source data and recalculate the same data each time you run the session.

Incremental Aggregation

When the session runs with incremental aggregation enabled for the first time say 1st week of Jan, we will use the entire source. This allows the Integration Service to read and store the necessary aggregate data information. On 2nd week of Jan, when we run the session again, we will filter out the CDC records from the source i.e the records loaded after the initial load. The Integration Service then processes these new data and updates the target accordingly.

Use incremental aggregation when the changes do not significantly change the target.If processing the incrementally changed source alters more than half the existing target, the session may not benefit from using incremental aggregation. In this case, drop the table and recreate the target with entire source data and recalculate the same aggregation formula .

INCREMENTAL AGGREGATION, may be helpful in cases when we need to load data in monthly facts in a weekly basis.

Sample Mapping

Let us see a sample mapping to implement incremental aggregation:

Incremental Aggregation Mapping

Look at the Source Qualifier query to fetch the CDC part using a BATCH_LOAD_CONTROL table that saves the last successful load date for the particular mapping.

Incremental Aggregation Source Qualifier

Look at the ports tab of Expression transformation.

Incremental Aggregation Expression

Look at the ports tab of Aggregator Transformation.

Incremental Aggregation Aggregator

Now the most important session properties configuration to implement incremental Aggregation

Incremental Aggregation Session property

If we want to reinitialize the aggregate cache suppose during first week of every month we will configure the same session in a new workflow at workflow level with the Reinitialize aggregate cache property checked in.

Incremental Aggregation Session property

Example with Data

Now have a look at the source table data:


After the first Load on 1st week of Jan 2010, the data in the target is as follows:


Now during the 2nd week load it will process only the incremental data in the source i.e those records having load date greater than the last session run date. After the 2nd weeks load after incremental aggregation of the incremental source data with the aggregate cache file data will update the target table with the following dataset:

11116008201001450The cache file updated after aggretation
22226009201001500The cache file updated after aggretation
33335003201001300The cache file remains the same as before
44441234201001350New group row inserted in cache file
55556157201001500New group row inserted in cache file

Understanding Incremental Aggregation Process

The first time we run an incremental aggregation session, the Integration Service processes the entire source. At the end of the session, the Integration Service stores aggregate data for that session run in two files, the index file and the data file. The Integration Service creates the files in the cache directory specified in the Aggregator transformation properties.

Each subsequent time we run the session with incremental aggregation, we use the incremental source changes in the session. For each input record, the Integration Service checks historical information in the index file for a corresponding group. If it finds a corresponding group, the Integration Service performs the aggregate operation incrementally, using the aggregate data for that group, and saves the incremental change. If it does not find a corresponding group, the Integration Service creates a new group and saves the record data.

When writing to the target, the Integration Service applies the changes to the existing target. It saves modified aggregate data in the index and data files to be used as historical data the next time you run the session.

Each subsequent time we run a session with incremental aggregation, the Integration Service creates a backup of the incremental aggregation files. The cache directory for the Aggregator transformation must contain enough disk space for two sets of the files.

The Integration Service creates new aggregate data, instead of using historical data, when we configure the session to reinitialize the aggregate cache, Delete cache files etc.

When the Integration Service rebuilds incremental aggregation files, the data in the previous files is lost.

Note: To protect the incremental aggregation files from file corruption or disk failure, periodically back up the files.

Have a question on this subject?

Ask questions to our expert community members and clear your doubts. Asking question or engaging in technical discussion is both easy and rewarding.

Are you on Twitter?

Start following us. This way we will always keep you updated with what's happening in Data Analytics community. We won't spam you. Promise.

  • Best Informatica Interview Questions & Answers

    Welcome to the finest collection of Informatica Interview Questions with standard answers that you can count on. Read and understand all the questions and their answers below and in the following pages to get a good grasp in Informatica. If you...

  • How to Tune Performance of Informatica Lookup Transformation

    To me, look-up is the single most important (and difficult) transformation that we need to consider while tuning performance of Informatica jobs. The choice and use of correct type of Look-Up can dramatically vary the session performance in...

  • Informatica Metadata Tables - Overview and Tutorial

    Informatica PowerCentre stores all the information about mapping, session, transformation, workflow etc. in a set of database tables called metadata tables. While these tables are used internally by Informatica, one can get useful information by...

  • Informatica Tuning - Step by Step Approach

    This is the first of the number of articles on the series of Data Warehouse Application performance tuning scheduled to come every week. This one is on Informatica performance tuning.

  • CDC Implementation using Flatfile

    This article shows how to use a flatfile to implement Change data Capture. Suppose we want to maintain the last extraction date in a flatfile, based on that value we want to capture the changed data of our business table.

  • All about Informatica Lookup

    A Lookup is a Passive, Connected or Unconnected Transformation used to look up data in a relational table, view, synonym or flat file. The integration service queries the lookup table to retrieve a value based on the input source value and the...

  • How to Tune Performance of Informatica Aggregator Transformation

    Similar to what we discussed regarding the Performance Tuning of Joiner Transformation, the basic rule for tuning aggregator is to avoid aggregator transformation altogether...

  • Implementing Informatica Partitions

    Identification and elimination of performance bottlenecks will obviously optimize session performance. After tuning all the mapping bottlenecks, we can further optimize session performance by increasing the number of pipeline partitions in the...

  • Using Informatica Normalizer Transformation

    Normalizer transformation is a native transformation in Informatica that can ease many complex data transformation requirements. Learn how to effectively use normalizer in this tutorial.

  • CDC Implementation using Informatica Variable

    This article explains the Change Data Capture mechanism using Informatica Mapping Variable. We can use the Informatica Mapping Variable to extract the CDC data without using any other custom table. Here it goes.