In the previous article, we showed how surrogate keys can be generated without using Sequence Generator transformation. However, if Informatica partitioning is implemented for such cases, then since each partition pipeline will call the lookup simultaneously, we will end up generating duplicate sequence numbers. In this article, we will see how we may resolve this issue.

This article lists down the challenges faced in real-time with the use of custom Sequencing logic using Expression while enabling Pipeline Partitioning in Informatica.

Suppose we have a transaction table JOB_ORDER_TRANS that has details about Job Orders created by the Marketing team in a product company. This transaction table has a sequence key (PK) in order to perform an update post the data load as demanded by the business and this PK is loaded using the custom sequencing logic mentioned in the following section.

This transaction table sources data from a stage table (JOB_ORDER_STG) that sends two million rows for each run i.e. two million rows are processed by ETL everyday using Incremental approach to load into this target table.

Unpartitioned Code

Existing code that loads this transaction table does not follow any partitioning strategy to read data from stage table because of which ETL takes more than one hour to complete the load, hence going beyond the agreed SLA. Sometimes, during the end of the month, the load on server that hosts the Integration Service further pushes the data load of this batch job by another 10 – 15 mins extra.

Refer below the structure of source and target tables and the sequencing logic followed in this code.

Order_key (PK)Number

The sequencing logic followed in this codes based on using “reusable expression and lookup transformation” to generate the Primary Key. We have discussed this approach in our earlier article.

Problem Statement

Since this code with Reusable Expression and Lookup based sequencing logic has performance concerns in Production, ETL development team has planned to increase the performance of the job by following the partition strategy to implement parallel data processing.

It’s decided to implement “Pass Through” partitioning at Informatica level by partitioning source data using GEO_CD column (Geography Id) from Stage source. Implementing partitioning at Informatica level enables parallel processing and the mapping to load Stage to Transaction table run multiple instances, one for each Geo Code partition.

Refer the article on Informatica Partitions to know about Informatica Partition options.

Challenges in using existing Sequence Logic

If we implement Pipeline partitioning at Informatica level, parallel data processing will create a pipeline for each partition. When the session runs, pipeline of individual partition runs in parallel and improves the session performance.

With the previously described sequencing logic, each pipeline partition calls the reusable lookup, get the same maximum value from the target table and generate sequence using Reusable expression. This behaviour generates duplicate sequence values across partition pipelines since each partition pipeline calls the lookup simultaneously and generates sequence in parallel.

Refer below the duplicate PKs loaded in the table.

1100105650088109736189801-04-2016 19:04
2100105750088284752456701-04-2016 19:04
1500206450088203842437801-04-2016 19:00
2500206781088139471680301-04-2016 19:04

Solution - Hybrid Sequencing

In order to overcome this issue of loading duplicate PK values in target table, a hybrid approach to generate sequence that uses the combination of below logic is followed.

  • Sequencing using Reusable Expression and Lookup (discussed above)
  • Sequence Generator (with Reset option)

Reason why this Hybrid approach is followed instead of just using Sequence Generator alone is to avoid draining of sequence keys in Production as the source volume is incremental for every run. Implementing Hybrid Sequencing requires minor tweaks on Reusable Lookup and Expression transformations and a new Sequence Generator transformation.

Reusable Lookup (Hybrid)

Reusable lookup has one I/P port (dummy), one Return Port to return the maximum Primary Key value and one Lookup port to match with I/P dummy port. There is no change on the ports.

But, Lookup Override returns 0 instead of 1 as Max(PK). It’s the only change on Lookup transformation. Lookup Override used in this transformation is shown below


Reusable Expression (Hybrid)

This Reusable expression has a minor change to accept NextVal from Sequence Generator to add it with Max (PK) from Reusable Lookup.

Port Expressions:


I_NEXTVAL refers the input port from Sequence Generator.

Sequence Generator:

Configure the Sequence Generator by setting “Reset” option so that the sequence will be reset to start from 1 for every run.

Mapping Snapshot:

Below picture shows the usage of hybrid sequencing approach in the mapping that loads Job Order Transaction table with pipeline partitioning enabled.

Refer below the PK loaded without duplicates as a result of Hybrid Implementation:

1100105650088109736189801-04-2016 19:04
2100105750088284752456701-04-2016 19:04
3500206450088203842437801-04-2016 19:00
4500206781088139471680301-04-2016 19:04

Have a question on this subject?

Ask questions to our expert community members and clear your doubts. Asking question or engaging in technical discussion is both easy and rewarding.

Are you on Twitter?

Start following us. This way we will always keep you updated with what's happening in Data Analytics community. We won't spam you. Promise.

  • APEAR - A tool for automating Informatica Performance Tuning

    DWBIConcepts is launching APEAR – an Automated Performance Evaluation and Reporting tool for Informatica. As the name suggests, this tool will help you tune the performance of Informatica sessions fully automatically. Now don't waste your precious...

  • Implementing Informatica Partitions

    Identification and elimination of performance bottlenecks will obviously optimize session performance. After tuning all the mapping bottlenecks, we can further optimize session performance by increasing the number of pipeline partitions in the...

  • Informatica Performance Tuning - Complete Guide

    This article is a comprehensive guide to the techniques and methodologies available for tuning the performance of Informatica PowerCentre ETL tool. It's a one stop performance tuning manual for Informatica.

  • How to Tune Performance of Informatica Joiner Transformation

    Joiner transformation allows you to join two heterogeneous sources in the Informatica mapping. You can use this transformation to perform INNER and OUTER joins between two input streams. For performance reasons, I recommend you ONLY use JOINER...

  • Implementing Informatica Incremental Aggregation

    Using incremental aggregation, we apply captured changes in the source data (CDC part) to aggregate calculations in a session. If the source changes incrementally and we can capture the changes, then we can configure the session to process those...

  • Informatica Metadata Tables - Overview and Tutorial

    Informatica PowerCentre stores all the information about mapping, session, transformation, workflow etc. in a set of database tables called metadata tables. While these tables are used internally by Informatica, one can get useful information by...

  • Loading Flatfiles delimited by comma and double quotes

    In this article let us take up a very trivial but an important aspect that we as DW developers usual face. This is related to loading flat file sources. Whenever we have flat file sources we usual ask source systems for a specific type of field...

  • Comparing Performance of SORT operation (Order By) in Informatica and Oracle

    In this "DWBI Concepts' Original article", we put Oracle database and Informatica PowerCentre to lock horns to prove which one of them handles data SORTing operation faster. This article gives a crucial insight to application developer in order to...

  • Calling C executable from Java Transform

    We are going to do is, to call C++ Executable from Informatica, using Passive Java Transform and capture the output of the C++ using Java and write the result to corresponding target column.

  • Working with Informatica Flatfiles

    In this article series we will try to cover all the possible scenarios related to flatfiles in Informatica.