Joiner transformation allows you to join two heterogeneous sources in the Informatica mapping. You can use this transformation to perform INNER and OUTER joins between two input streams.

For performance reasons, I recommend you ONLY use JOINER transformation if any of the following condition is true –

  1. Data sources are not relational in nature (joining two flat files)
  2. Data sources are heterogeneous in nature (e.g. Flat File and Database)
  3. Data sources are coming from different relational connections
  4. One or more column(s) used in the joining condition(s) of the JOINER is derived in the mapping

Except the above mentioned conditions, it is better to use database side joins. To know why, please read –

Database performs join faster than Informatica

Although the article in the above link is particular about Oracle database but the conclusion will hold true in case of most of the other databases.

Tuning JOINER Transformation

However, if you have to use the joiner transformation, following are the additional points/actions that you must consider:

  • When joining between two data sources, treat the data source containing less number of records as Master. This is because the Cache size of the Joiner transformation depends on master data (unless sorted input with the same source is used).
  • Ensure that both the master and detail input sources are sorted and both “Sorted Input” and “Master Sort Order” ports are checked and set
  • Consider using cache partitioning for Joiner transformation if you have partition option available under your Informatica license. More details on this topic later
  • Check if the Data and Index cache sizes can be configured. More details on this next

Understanding Joiner Cache

Joiner Transformation needs a space to store the cache for the data and index. This cache can be either memory cache (stored in RAM) or disk cache (stored in hard drive disks) or both depending on various factors which I won’t discuss now. Obviously the memory cache is much faster than the disk cache. So enough system memory should be available to Informatica for faster Join operation. One can actually configure the amount of memory for Joiner data cache and index cache by the following two options under Joiner Transformation Properties:

  1. Joiner Data Cache Size
  2. Joiner Index Cache Size

If you click on these properties under the “Mapping” tab of the session, you can access “Joiner-Cache Calculator” which is a small tool inbuilt into the PowerCenter Manager for calculating the required amount of cache sizes for the joining operation. You can use the values suggested by this calculator for joiner data and index cache or you can leave them as AUTO. If you do not leave them as Auto and input some values in those boxes, you must ensure that the allocated memory is available while the session executes. Otherwise the session will fail.

I prefer to leave Joiner Data and Index Cache sizes parameters to Auto and set the maximum values for the auto memory attributes in the session level. To know why, please read on.

Partitioning the Joiner cache

If the session containing the joiner is already partitioned, then one can take the advantage of cache partitioning for the Joiner. As the name suggests, the joiner cache itself gets divided in different partitions in this case. The benefit of this is Informatica accesses the cache in parallel for each partition which is faster than accessing the full cache in serial.

In order to achieve this cache partition, you need to create a partition point on the Joiner Transformation by choosing the “Add Partition Point” option in Session level under the Mapping::Partitions tab. Then you can edit that partition point to add more than one Hash Auto-key partitions to it. Suppose if you add 4 hash auto key partitions to the Joiner transformation then, by default, Informatica will automatically add 4 Pass-Through partitions in the source qualifier transformations of *both* the master and detail pipelines.

The benefit of choosing Hash Auto Keys partition in the Joiner transformation is – you need NOT explicitly tell Informatica how to divide the incoming data to individual partitions in Joiner level. You set it to Hash Auto Keys and you forget it, Informatica will take care for the rest.

However, as soon as you add number of Hash Auto Keys partition to the Joiner level, your source qualifiers also get Pass-Through partitioned. Here you may override the Source Qualifier query and specify individual SQL queries for each partition in the source qualifier transformation level. Supposing your original source side SQL query is like below:

SELECT AccountID, TransactionType, TransactionAmount 
FROM Transactions

You can override the above query for each partition level like below:

Partition #1:

SELECT AccountID, TransactionType, TransactionAmount 
FROM Transactions 
WHERE AccountType = ‘SAVINGS’

Partition #2:

SELECT AccountID, TransactionType, TransactionAmount 
FROM Transactions 
WHERE AccountType = ‘CURRENT’

Partition #3:

SELECT AccountID, TransactionType, TransactionAmount 
FROM Transactions 
WHERE AccountType = ‘CHECK-IN’

Partition #4:

SELECT AccountID, TransactionType, TransactionAmount 
FROM Transactions 
WHERE AccountType = ‘DEMAT’

The above method ensures that each of your source qualifier partition is populated with different set of data.

Alternatively you may also change the partition type in source qualifier level from Pass-Through to “Key range” and specify start and end range of values for each partition. You can also change the partition type to “Database Partitioning” if your source database is Oracle or DB2.

Another important point to note here is – if you add SORTER transformation before Joiner (which you should always do if your data is not already sorted from source) – then you should also consider creating partition points and adding same number of partitions to the sorter transformation. If the partition type at the sorter level is Hash Auto Key, then you need not add any partition point in the Joiner Transformation level

Based on whether your joiner data is sorted or not and the cache is partitioned or not, different number of cache(s) will be created by Informatica as shown below:

Number of Cache(s) in Disk Number of Cache(s) in Memory
Data Sorted Only one Equal to number of partitions
Data Un-sorted/ Not Partitioned Only one Only one
Data Un-sorted/ Partitioned Equal to number of partitions Equal to number of partitions

So, this is all about tuning a joiner transformation.

Have a question on this subject?

Ask questions to our expert community members and clear your doubts. Asking question or engaging in technical discussion is both easy and rewarding.

Are you on Twitter?

Start following us. This way we will always keep you updated with what's happening in Data Analytics community. We won't spam you. Promise.

  • Informatica Tuning - Step by Step Approach

    This is the first of the number of articles on the series of Data Warehouse Application performance tuning scheduled to come every week. This one is on Informatica performance tuning.

  • Using Informatica Stored Procedure Transformation

    Stored Procedure Transformation - as the name suggests is used to execute stored procedures through Informatica ETL. It can also be used to call functions to return calculated values. The Stored Procedures that are to be executed should be...

  • Pushdown Optimization In Informatica

    Pushdown Optimization which is a new concept in Informatica PowerCentre, allows developers to balance data transformation load among servers. This article describes pushdown techniques.

  • APEAR - A tool for automating Informatica Performance Tuning

    DWBIConcepts is launching APEAR – an Automated Performance Evaluation and Reporting tool for Informatica. As the name suggests, this tool will help you tune the performance of Informatica sessions fully automatically. Now don't waste your precious...

  • Calling C executable from Java Transform

    We are going to do is, to call C++ Executable from Informatica, using Passive Java Transform and capture the output of the C++ using Java and write the result to corresponding target column.

  • Implementing Informatica Incremental Aggregation

    Using incremental aggregation, we apply captured changes in the source data (CDC part) to aggregate calculations in a session. If the source changes incrementally and we can capture the changes, then we can configure the session to process those...

  • CDC Implementation using Flatfile

    This article shows how to use a flatfile to implement Change data Capture. Suppose we want to maintain the last extraction date in a flatfile, based on that value we want to capture the changed data of our business table.

  • Informatica Reject File - How to Identify rejection reason

    When we run a session, the integration service may create a reject file for each target instance in the mapping to store the target reject record. With the help of the Session Log and Reject File we can identify the cause of data rejection in the...

  • Useful Informatica Metadata Repository Queries

    Informatica metadata repository stores and maintains information about all the objects in Informatica. They contain details of connection information, users, folders, mappings, sources, targets etc. These information can serve many purposes while...

  • PowerCenter SOA Components

    PowerCenter has a Service-Oriented Architecture that provides the ability to scale services and share resources across multiple machines. Let us know more about the components and services associated with Powercenter.