Informatica

PowerCenter SOA Components

Saurav Mitra

PowerCenter has a Service-Oriented Architecture that provides the ability to scale services and share resources across multiple machines. Let us know more about the components and services associated with Powercenter.

Domain

A domain is the fundamental administrative unit for Informatica nodes and services. An Informatica domain is a collection or group of nodes and services that define the Informatica platform.

A PowerCenter domain that we create while installing PowerCenter is called Local Domain. This is the domain we access when we log in to the Administration Console. A domain consists of one or more nodes, Service Manager and Application services.

Domain Configuration Information is stored in a set of relational database tables managed by the Service Manager and accessible to all Gateway Nodes in the domain. The domain configuration database typically stores the following domain metadata information:

  • Host names and Port numbers of all the nodes in the domain including the Master Gateway Node.
  • CPU usage for each Application Service
  • Number of Repository Services running in the domain.
  • Native or LDAP User and Groups.
  • Privileges and Roles assigned to users and groups in the domain.

Node

A node is a logical representation of a machine or a blade. Each node runs a Service Manager that performs domain operations on that node.

A Gateway node receives service requests from clients and routes them to the appropriate service and node. A gateway node can run Application Services. In the Administration Console, we can configure any node to serve as a gateway for a PowerCenter domain.

When we configure a domain with multiple gateway nodes, one node acts as the Master Gateway Node. The master gateway node acts as the entry point to a PowerCenter domain. If the master gateway node becomes unavailable, the Service Manager on other gateway nodes elect another master gateway node.

Any node not configured to serve as a gateway is called the Worker Node. A worker node can run application services but cannot serve as a master gateway node.

A Primary Node is configured as the default node to run a Service Process. By default, the Service Manager starts the service process on the primary node and uses a backup node if the primary node fails. Any node that is configured to run a service process, but is not configured as a primary node is the Backup Node.

Grid & HA

An alias assigned to a group of nodes to run sessions and workflows is called Grid object. High Availability is the PowerCenter option that eliminates a single point of failure in a domain and provides minimal service interruption in the event of network or hardware failure. High availability includes resilience, failover, and recovery for services and tasks in a domain.

Service Manager

Service Manager is a service or daemon that runs on all nodes in the domain to support the application services and manages all domain operations. When we start Informatica Services on a node, we actually start the Service Manager on that node. If the Service Manager is not running, the node is not available.

Application Service

A service process is the run-time representation of a service running on a node. A service that runs on one or more nodes in the Informatica domain is called Application Service. We can create, manage and configure application services in Informatica Administrator or through the infacmd command program based on the environment requirements.

An application service that we associate with another application service is called Associated Service. For example, we associate a Repository Service with an Integration Service.

A service that depends on another service to run processes is called Dependent Services. For example, the Integration Service cannot run workflows if the Repository Service is not running.

PowerCenter Repository

The PowerCenter Repository stores information required to extract, transform, and load data in a relational database tables. It also stores administrative information such as permissions and privileges for users and groups that have access to the repository. PowerCenter applications access the PowerCenter repository through the Repository Service.

The Global Repository is the hub of the repository domain used to store common objects like source definitions, reusable transformations, mapplets, and mappings best suited for multi-user development environment. A Local Repository is any repository within the domain that is not the global repository used for development. In a local repository, we can create shortcuts to objects in shared folders of the global repository. Versioned Repository can be either a local or global repository that allows version control for the repository. A versioned repository can store multiple copies, or versions of an object. This features allows to efficiently develop, test and deploy metadata in the production environment.

A group of linked repositories consisting of one Global repository and one or more Local repositories is called Repository Domain.

Repository Service

An application service that manages the PowerCenter repository is called the Repository Service. The Repository Service is a separate, multi-threaded process that retrieves, inserts, and updates metadata in the repository database tables. Repository Service ensures the consistency of metadata in the repository.

Metadata Manager

An application service that runs the Metadata Manager application in a PowerCenter domain is called Metadata Manager Service. It manages access to metadata in the Metadata Manager warehouse. Metadata manager is used to browse, analyze, and manage metadata from disparate metadata repositories. Further can be used for Impact Analysis, Data lineage, Profiling.

Integration Service

Integration Service is an application service that runs data integration workflows and loads metadata into the Metadata Manager warehouse. Integration Service process accepts requests from the PowerCenter Client and from pmcmd. The Integration Service process manages workflow scheduling, locks and reads workflows, and starts DTM processes. pmserver process is basically the Integration Service process.

When we run workflows and sessions on a grid, the Integration Service designates one Integration Service process that runs the workflow and workflow tasks as the Master Service Process. The master service process can distribute the Session, Command, and predefined Event-Wait tasks to Worker Service Processes. The master service process monitors all Integration Service processes.

DTM

The Data Transformation Manager is the process that reads, writes, and transforms data. pmdtm is the Data Transformation Manager process.

Any PowerCenter component that connects to the repository is called repository client. This includes the PowerCenter Client, Integration Service, pmcmd, pmrep, and MX SDK.

Load Balancer

Load Balancer is a component of the Integration Service that dispatches Session, Command, and predefined Event-Wait tasks across nodes in a grid. Dispatch Mode is the mode used by the Load Balancer to dispatch tasks to nodes in a grid.

A dispatch mode in which the Load Balancer dispatches tasks to the node with the most available CPUs is called Adaptive Dispatch Mode. A dispatch mode in which the Load Balancer checks current computing load against the resource provision thresholds and then dispatches tasks in a round-robin fashion to nodes where the thresholds are not exceeded is called Metric-based Dispatch Mode. A dispatch mode in which the Load Balancer dispatches tasks to available nodes in a round-robin fashion up to the Maximum Processes resource provision threshold is called Round-robin Dispatch Mode.

A domain property that establishes priority among tasks that are waiting to be dispatched is called Service Level. When multiple tasks are waiting in the Dispatch Queue, the Load Balancer checks the service level of the associated workflow so that it dispatches high priority tasks before low priority tasks.

Log Agent

A Service Manager function that provides accumulated log events from session and workflows. We can view session and workflow logs in the Workflow Monitor. The Log Agent runs on the nodes where the Integration Service process runs.

Log Manager

A Service Manager function that provides accumulated log events from each service in the domain. We can view logs in the Administration Console. The Log Manager runs on the master gateway node.