Identification and elimination of performance bottlenecks will obviously optimize session performance. After tuning all the mapping bottlenecks, we can further optimize session performance by increasing the number of pipeline partitions in the session. Adding partitions can improve performance by utilizing more of the system hardware while processing the session.

PowerCenter Informatica Pipeline Partition

The PowerCenter® Partitioning Option increases the performance of PowerCenter through parallel data processing. This option provides a thread-based architecture and automatic data partitioning that optimizes parallel processing on multiprocessor and grid-based hardware environments.

There are different Types of Informatica Partitions, eg.

  1. Database partitioning
  2. Hash auto-keys
  3. Hash user keys
  4. Key range
  5. Pass-through
  6. Round-robin

Informatica Pipeline Partitioning Explained

Each mapping contains one or more pipelines. A pipeline consists of a source qualifier, all the transformations and the target. When the Integration Service runs the session, it can achieve higher performance by partitioning the pipeline and performing the extract, transformation, and load for each partition in parallel. Basically a partition is a pipeline stage that executes in a single reader, transformation, or writer thread.

The number of partitions in any pipeline stage equals the number of threads in the stage. By default, the Integration Service creates one partition in every pipeline stage. If we have the Informatica Partitioning option, we can configure multiple partitions for a single pipeline stage.

Setting partition attributes includes partition points, the number of partitions, and the partition types. In the session properties we can add or edit partition points. When we change partition points we can define the partition type and add or delete partitions(number of partitions).

We can set the following attributes to partition a pipeline:

  1. Partition point:

    Partition points mark thread boundaries and divide the pipeline into stages. A stage is a section of a pipeline between any two partition points. The Integration Service redistributes rows of data at partition points. When we add a partition point, we increase the number of pipeline stages by one. Increasing the number of partitions or partition points increases the number of threads.

    We cannot create partition points at Source instances or at Sequence Generator transformations.

  2. Number of partitions:

    A partition is a pipeline stage that executes in a single thread. If we purchase the Partitioning option, we can set the number of partitions at any partition point. When we add partitions, we increase the number of processing threads, which can improve session performance. We can define up to 64 partitions at any partition point in a pipeline. When we increase or decrease the number of partitions at any partition point, the Workflow Manager increases or decreases the number of partitions at all partition points in the pipeline. The number of partitions remains consistent throughout the pipeline. The Integration Service runs the partition threads concurrently.

  3. Partition types:

    The Integration Service creates a default partition type at each partition point. If we have the Partitioning option, we can change the partition type. The partition type controls how the Integration Service distributes data among partitions at partition points.

    We can define the following partition types here: Database partitioning, Hash auto-keys, Hash user keys, Key range, Pass-through, Round-robin.

    • Database partitioning:

      The Integration Service queries the database system for table partition information. It reads partitioned data from the corresponding nodes in the database.

    • Pass-through:

      The Integration Service processes data without redistributing rows among partitions. All rows in a single partition stay in the partition after crossing a pass-through partition point. Choose pass-through partitioning when we want to create an additional pipeline stage to improve performance, but do not want to change the distribution of data across partitions.

    • Round-robin:

      The Integration Service distributes data evenly among all partitions. Use round-robin partitioning where we want each partition to process approximately the same numbers of rows i.e. load balancing.

    • Hash auto-keys:

      The Integration Service uses a hash function to group rows of data among partitions. The Integration Service groups the data based on a partition key. The Integration Service uses all grouped or sorted ports as a compound partition key. We may need to use hash auto-keys partitioning at Rank, Sorter, and unsorted Aggregator transformations.

    • Hash user keys:

      The Integration Service uses a hash function to group rows of data among partitions. We define the number of ports to generate the partition key.

    • Key range:

      The Integration Service distributes rows of data based on a port or set of ports that we define as the partition key. For each port, we define a range of values. The Integration Service uses the key and ranges to send rows to the appropriate partition. Use key range partitioning when the sources or targets in the pipeline are partitioned by key range.

Points to consider while using Informatica partitions

  • We cannot create a partition key for hash auto-keys, round-robin, or pass-through types partitioning
  • If you have bitmap index defined upon the target and you are using pass-through partitioning to, say Update the target table - the session might fail as bitmap index creates serious locking problem in this scenario
  • Partitioning considerably increases the total DTM buffer memory requirement for the job. Ensure you have enough free memory in order to avoid memory allocation failures
  • When you do pass-through partitioning, Informatica will try to establish multiple connection requests to the database server. Ensure that database is configured to accept high number of connection requests
  • As an alternative to partitioning, you may also use native database options to increase degree of parallelism of query processing. For example in Oracle database you can either specify PARALLEL hint or alter the DOP of the table in subject.
  • If required you can even combine Informatica partitioning with native database level parallel options - e.g. you create 5 pass-through pipelines, each sending query to Oracle database with PARALLEL hint.

Have a question on this subject?

Ask questions to our expert community members and clear your doubts. Asking question or engaging in technical discussion is both easy and rewarding.

Are you on Twitter?

Start following us. This way we will always keep you updated with what's happening in Data Analytics community. We won't spam you. Promise.

  • How to get Informatica Repository Information from Metadata

    Some metadata views can be very handy to get Informatica Repository Information. Know How.

  • CDC Implementation using Informatica Variable

    This article explains the Change Data Capture mechanism using Informatica Mapping Variable. We can use the Informatica Mapping Variable to extract the CDC data without using any other custom table. Here it goes.

  • Generate Surrogate Key without using Sequence Generator

    It is possible to generate sequential surrogate key in the target table without the use of an Informatica Sequence Generator transformation. Using this option, one can avoid any gap in the sequence numbers of the surrogate key.

  • CDC Implementation using Flatfile

    This article shows how to use a flatfile to implement Change data Capture. Suppose we want to maintain the last extraction date in a flatfile, based on that value we want to capture the changed data of our business table.

  • When to use Informatica Stored Procedure Transformation

    There are loads of mis-information spreaded across Internet on good use-cases of Informatica Stored Procedure transformation. Exactly where do you use this transformation? This article finds out.

  • The benefit and disadvantage of Informatica Persistent Cache Lookup

    Persistent cache may be your choice of caching when it comes to lookup performance. But you should be aware of the hazards of persistent cache as well.

  • Informatica Tuning - Step by Step Approach

    This is the first of the number of articles on the series of Data Warehouse Application performance tuning scheduled to come every week. This one is on Informatica performance tuning.

  • Informatica Java Transformation

    Feel the Power of Java programming language to transform data in PowerCenter Informatica. Java Transformation in Informatica can be used either in Active or Passive Mode.

  • Pushdown Optimization In Informatica

    Pushdown Optimization which is a new concept in Informatica PowerCentre, allows developers to balance data transformation load among servers. This article describes pushdown techniques.

  • Informatica Metadata Tables - Overview and Tutorial

    Informatica PowerCentre stores all the information about mapping, session, transformation, workflow etc. in a set of database tables called metadata tables. While these tables are used internally by Informatica, one can get useful information by...