In the previous article, we showed how surrogate keys can be generated without using Sequence Generator transformation. However, if Informatica partitioning is implemented for such cases, then since each partition pipeline will call the lookup simultaneously, we will end up generating duplicate sequence numbers. In this article, we will see how we may resolve this issue.

This article lists down the challenges faced in real-time with the use of custom Sequencing logic using Expression while enabling Pipeline Partitioning in Informatica.

Suppose we have a transaction table JOB_ORDER_TRANS that has details about Job Orders created by the Marketing team in a product company. This transaction table has a sequence key (PK) in order to perform an update post the data load as demanded by the business and this PK is loaded using the custom sequencing logic mentioned in the following section.

This transaction table sources data from a stage table (JOB_ORDER_STG) that sends two million rows for each run i.e. two million rows are processed by ETL everyday using Incremental approach to load into this target table.

Unpartitioned Code

Existing code that loads this transaction table does not follow any partitioning strategy to read data from stage table because of which ETL takes more than one hour to complete the load, hence going beyond the agreed SLA. Sometimes, during the end of the month, the load on server that hosts the Integration Service further pushes the data load of this batch job by another 10 – 15 mins extra.

Refer below the structure of source and target tables and the sequencing logic followed in this code.

Order_key (PK)Number

The sequencing logic followed in this codes based on using “reusable expression and lookup transformation” to generate the Primary Key. We have discussed this approach in our earlier article.

Problem Statement

Since this code with Reusable Expression and Lookup based sequencing logic has performance concerns in Production, ETL development team has planned to increase the performance of the job by following the partition strategy to implement parallel data processing.

It’s decided to implement “Pass Through” partitioning at Informatica level by partitioning source data using GEO_CD column (Geography Id) from Stage source. Implementing partitioning at Informatica level enables parallel processing and the mapping to load Stage to Transaction table run multiple instances, one for each Geo Code partition.

Refer the article on Informatica Partitions to know about Informatica Partition options.

Challenges in using existing Sequence Logic

If we implement Pipeline partitioning at Informatica level, parallel data processing will create a pipeline for each partition. When the session runs, pipeline of individual partition runs in parallel and improves the session performance.

With the previously described sequencing logic, each pipeline partition calls the reusable lookup, get the same maximum value from the target table and generate sequence using Reusable expression. This behaviour generates duplicate sequence values across partition pipelines since each partition pipeline calls the lookup simultaneously and generates sequence in parallel.

Refer below the duplicate PKs loaded in the table.

1100105650088109736189801-04-2016 19:04
2100105750088284752456701-04-2016 19:04
1500206450088203842437801-04-2016 19:00
2500206781088139471680301-04-2016 19:04

Solution - Hybrid Sequencing

In order to overcome this issue of loading duplicate PK values in target table, a hybrid approach to generate sequence that uses the combination of below logic is followed.

  • Sequencing using Reusable Expression and Lookup (discussed above)
  • Sequence Generator (with Reset option)

Reason why this Hybrid approach is followed instead of just using Sequence Generator alone is to avoid draining of sequence keys in Production as the source volume is incremental for every run. Implementing Hybrid Sequencing requires minor tweaks on Reusable Lookup and Expression transformations and a new Sequence Generator transformation.

Reusable Lookup (Hybrid)

Reusable lookup has one I/P port (dummy), one Return Port to return the maximum Primary Key value and one Lookup port to match with I/P dummy port. There is no change on the ports.

But, Lookup Override returns 0 instead of 1 as Max(PK). It’s the only change on Lookup transformation. Lookup Override used in this transformation is shown below


Reusable Expression (Hybrid)

This Reusable expression has a minor change to accept NextVal from Sequence Generator to add it with Max (PK) from Reusable Lookup.

Port Expressions:


I_NEXTVAL refers the input port from Sequence Generator.

Sequence Generator:

Configure the Sequence Generator by setting “Reset” option so that the sequence will be reset to start from 1 for every run.

Mapping Snapshot:

Below picture shows the usage of hybrid sequencing approach in the mapping that loads Job Order Transaction table with pipeline partitioning enabled.

Refer below the PK loaded without duplicates as a result of Hybrid Implementation:

1100105650088109736189801-04-2016 19:04
2100105750088284752456701-04-2016 19:04
3500206450088203842437801-04-2016 19:00
4500206781088139471680301-04-2016 19:04

Have a question on this subject?

Ask questions to our expert community members and clear your doubts. Asking question or engaging in technical discussion is both easy and rewarding.

Are you on Twitter?

Start following us. This way we will always keep you updated with what's happening in Data Analytics community. We won't spam you. Promise.

  • Pushdown Optimization In Informatica

    Pushdown Optimization which is a new concept in Informatica PowerCentre, allows developers to balance data transformation load among servers. This article describes pushdown techniques.

  • Informatica Reject File - How to Identify rejection reason

    When we run a session, the integration service may create a reject file for each target instance in the mapping to store the target reject record. With the help of the Session Log and Reject File we can identify the cause of data rejection in the...

  • The benefit and disadvantage of Informatica Persistent Cache Lookup

    Persistent cache may be your choice of caching when it comes to lookup performance. But you should be aware of the hazards of persistent cache as well.

  • CDC Implementation using Informatica Variable

    This article explains the Change Data Capture mechanism using Informatica Mapping Variable. We can use the Informatica Mapping Variable to extract the CDC data without using any other custom table. Here it goes.

  • APEAR - A tool for automating Informatica Performance Tuning

    DWBIConcepts is launching APEAR – an Automated Performance Evaluation and Reporting tool for Informatica. As the name suggests, this tool will help you tune the performance of Informatica sessions fully automatically. Now don't waste your precious...

  • Generate Surrogate Key without using Sequence Generator

    It is possible to generate sequential surrogate key in the target table without the use of an Informatica Sequence Generator transformation. Using this option, one can avoid any gap in the sequence numbers of the surrogate key.

  • What is Active Lookup Transformation

    Informatica 9x allows us to configure Lookup transformation to return multiple rows. So now we can retrieve multiple rows from a lookup table thus making Lookup transformation an Active transformation type.

  • Working with Informatica Flatfiles

    In this article series we will try to cover all the possible scenarios related to flatfiles in Informatica.

  • Implementing Informatica Partitions

    Identification and elimination of performance bottlenecks will obviously optimize session performance. After tuning all the mapping bottlenecks, we can further optimize session performance by increasing the number of pipeline partitions in the...

  • Informatica Performance Tuning - Complete Guide

    This article is a comprehensive guide to the techniques and methodologies available for tuning the performance of Informatica PowerCentre ETL tool. It's a one stop performance tuning manual for Informatica.