Persistent cache may be your choice of caching when it comes to lookup performance. But you should be aware of the hazards of persistent cache as well.

Overview

If you enable caching in a Lookup transformation, the Integration Service builds a cache in memory to store lookup data. If the lookup does not change between sessions, you can configure the transformation to use a persistent lookup cache. When you run the session containing persistent lookup cache, the Integration Service rebuilds the persistent cache if any cache file is missing or invalid. For details, please check How to Implement Informatica Persistent Cache.

Benefits of Persistent Cache Lookup

  • Depending on your situation, persistent cache can give you huge performance boost during the session runtime since the Integration Service does not need to rebuild the lookup cache again, thereby saving time.
  • The very fact that the data in a persistent cache file does not refresh or change in session run can be used to overcome major functional hurdles.

I would like to illustrate below one problematic situation that is often encountered in long running ETL while trying to load data in Fact tables and how the same can be overcome by using Persistent Cache lookups.

Problem: Ensuring Data Integrity in Long Running Load when underlying tables are changing

Suppose you have a requirement of producing cross subject area reports combining two fact tables from two different data marts – Sales and Marketing. The loading of both the facts are started at the 1st day of the month, however sales fact is loaded first and then the marketing fact is started.

Now suppose sales fact, being a very huge volume fact, takes more than a day to complete the load and there is a possibility that the surrogate keys flowing in to this fact from different SCD Type-II dimension tables get changed in the mean time during the daily refresh. Meaning the surrogate key of one customer populated in marketing fact will not be same with the surrogate key of the same customer populated in Sales Fact since they are loaded in two different days and mean time the key may have been changed.

Overcome using Persistent Cache Lookup

How to overcome this issue? If you think closely the issue basically boils down to ensuring that both facts get the same surrogate keys that can be easily achieved by using persistent cache lookup so that even if the underlying tables change data during daily loads, all the facts get the same set of keys.

Disadvantage of Persistent cache Lookup

Although persistent cache can give you considerable performance and other advantages, it comes with some hazards.

  • Recovering sessions after failure in midway may not be possible. Consider this – you have a typical update-else-insert logic in a target table over which you also have persistent cache lookup. This PC lookup on target is used to determine if a certain record coming from source is already present in target (Update) or not (Insert). Suppose this session got failed after inserting a few record in target. If this was a normal lookup, you could simply restart the session after fixing the cause of the error. But if this is a persistent cache lookup, you can not restart this job directly as because the lookup cache will not be holding the records that got inserted in the first time and as a result the lookup will fail to determine that these records are already existing in target and your mapping will try to insert them once again.
  • Persistent cache gives you performance boost by saving time in building the lookup while session run but it still takes the same time like a normal lookup for using the lookup data during session runtime. It is often observed that persistent cache shared across many sessions creates huge disk level I/O contention when the lookup is actually being used in the sessions. You need to monitor the disk IO performance using “iostat” or “vmstat” (UNIX) if you see huge transformation bottleneck without any apparent reason in sessions using persistent cache lookup.


Have a question on this subject?

Ask questions to our expert community members and clear your doubts. Asking question or engaging in technical discussion is both easy and rewarding.

Are you on Twitter?

Start following us. This way we will always keep you updated with what's happening in Data Analytics community. We won't spam you. Promise.

  • Useful Informatica Metadata Repository Queries

    Informatica metadata repository stores and maintains information about all the objects in Informatica. They contain details of connection information, users, folders, mappings, sources, targets etc. These information can serve many purposes while...

  • Working with Informatica Flatfiles

    In this article series we will try to cover all the possible scenarios related to flatfiles in Informatica.

  • What is Active Lookup Transformation

    Informatica 9x allows us to configure Lookup transformation to return multiple rows. So now we can retrieve multiple rows from a lookup table thus making Lookup transformation an Active transformation type.

  • Using Informatica Stored Procedure Transformation

    Stored Procedure Transformation - as the name suggests is used to execute stored procedures through Informatica ETL. It can also be used to call functions to return calculated values. The Stored Procedures that are to be executed should be...

  • APEAR - A tool for automating Informatica Performance Tuning

    DWBIConcepts is launching APEAR – an Automated Performance Evaluation and Reporting tool for Informatica. As the name suggests, this tool will help you tune the performance of Informatica sessions fully automatically. Now don't waste your precious...

  • Loading Flatfiles delimited by comma and double quotes

    In this article let us take up a very trivial but an important aspect that we as DW developers usual face. This is related to loading flat file sources. Whenever we have flat file sources we usual ask source systems for a specific type of field...

  • Informatica Metadata Tables - Overview and Tutorial

    Informatica PowerCentre stores all the information about mapping, session, transformation, workflow etc. in a set of database tables called metadata tables. While these tables are used internally by Informatica, one can get useful information by...

  • All about Informatica Lookup

    A Lookup is a Passive, Connected or Unconnected Transformation used to look up data in a relational table, view, synonym or flat file. The integration service queries the lookup table to retrieve a value based on the input source value and the...

  • Using Informatica Normalizer Transformation

    Normalizer transformation is a native transformation in Informatica that can ease many complex data transformation requirements. Learn how to effectively use normalizer in this tutorial.

  • Aggregation with out Informatica Aggregator

    Since Informatica process data on row by row basis, it is generally possible to handle data aggregation operation even without an Aggregator Transformation. On certain cases, you may get huge performance gain using this technique!