Map CDC Operation is used to support relational or mainframe changed-data capture (CDC). This transform reads the source DML changes i.e. INSERT/UPDATE/DELETE and reflects the same in the target system. So the transform maps all the source rows logged as insert to an insert, all updates to update and the deletes to delete in the target table.

Whenever a CDC based source table is imported in Data Services, it generates two more additional columns namely DI_OPERATION_TYPE and DI_SEQUENCE_NUMBER.

  • The DI_OPERATION_TYPE column holds the corresponding database DML operation codes for insert,update & delete. Possible OP codes are I - Insert, B - Before Image of update, U - Update, D - Delete
  • The DI_SEQUENCE_NUMBER column holds the information of the order/sequence of the database DML operations. For example if we want to update or delete the row first and then insert it, it doesn't makes sense; Hence the execution order has to be maintained. The order of DML command execution is very important.

Transform Properties

  • Sequencing column should be mapped to the column which specifies an integer used to order table rows or DML execution ordering in source.
  • Row operation column should be mapped to a column which specifies the database DML operation codes.
  • Input already sorted by sequencing column - This transform by default assumes that the input data is already sorted based on the value selected in the Sequencing column box. If we deselect this check box, Data Services will re-sort the input data based on the Sequencing column. Use the re-sort capability of this transform only when necessary as it impacts job performance.

So lets consider a scenario where the Source relational database Oracle/MSSQL is having CDC enabled and a particular table have been selected for CDC; Next we want to reflect these changes to a same table structure in another remote target database. In this case we can use Map_CDC_Operation transform.

Implementation

Let us consider we have a CDC based source table called PUBLISHER and we want to replicate the same to a different database using Map_CDC_Operation. Find below the implementation Job & Dataflow for the same:

Sample Job for Map_CDC_Operation

Sample Dataflow for Map_CDC_Operation

1. The initialization script is used to get the maximum value of DI_SEQUENCE_NUMBER extracted during last execution time. After each execution the value of DI_SEQUENCE_NUMBER is preserved in a Replication control table.

Initialization Script

#MSSQL Replication Control Table

$DI_SEQUENCE_NUMBER = cast( sql( 'DS_TGT_REPO', 
'SELECT ISNULL( MAX( DI_SEQUENCE_NUMBER ), -1 ) FROM DW_LOAD_CONTROL 
WHERE TARGET_NAME = {$TARGET_NAME}' ), 'int' );

2. Here PUBLISHER is our MSSQL CDC based Source table. The Query transform extracts only those tuples where the DI_SEQUENCE_NUMBER is greater than the last maximum DI_SEQUENCE_NUMBER extracted during last run. Also we have selected DI_SEQUENCE_NUMBER column in the ORDER BY tab as sort order Ascending

Query Extraction

3. Next we have the Map_CDC_Operation transform followed by the target table which is a replica of the Source DB table.

Map_CDC_Operation transform

The above functionality can also be achieved with a CASE transform plus three Map_Operation transforms, although the Map_CDC_Operation is just more convenient.

CDC based source Traditional method

NOTE

  • While using a relational or mainframe CDC source table, the DI_OPERATION_TYPE & DI_SEQUENCE_NUMBER columns are automatically selected as the Row operation column and the Sequencing column of the Map_CDC_Operation respectively.
  • While using a relational or mainframe CDC source table, the DI_OPERATION_TYPE & DI_SEQUENCE_NUMBER columns appear in the input schema, and are automatically removed in the output schema. To propagate these columns to the output, create additional columns that map to them prior to coming into the Map_CDC_Operation transform.
  • The Relational CDC sources supported by the transform includes Oracle and SQL Server.
  • This transform is typically the last object before the target in a data flow because it produces INPUT, UPDATE and DELETE operation codes. Data Services produces a warning if other objects are used in between Map_CDC_Operation & Target.

USE CASE

A Map_CDC_Operation can be used to replicate source CDC table changes to another table of similar structure in a remote database or staging area of a data warehouse. Also we can use this transform to replicate/sync a SAP HANA database target table from/with a CDC enabled Source table.

In this context, the source based CDC tables can be optimally used to load slowly changing dimension tables of Data warehouse. In this context it is relevant to highlight that, since the DML operations codes/indicators are coming along with the source records, it might not at all be required to lookup the target dimension table anymore during dimension loading.

What happens in Slowly Changing Dimension Type1

Database Change data capture, records insert, update, and delete activity that is applied to a table. So based on the DI_OPERATION_TYPE & DI_SEQUENCE_NUMBER, we will INSERT or UPDATE records accordingly to target dimension table of SCD TYPE 1. Here we may filter out the records with DI_OPERATION_TYPE as 'D' i.e. delete at the source, because we usually don't delete records from dimension table.

What happens in Slowly Changing Dimension Type2

1. If a changed record comes from the source, which is not present in the target dimension table:

- A new entry is created in the dimension table with Active Flag marked as True/Yes and the record validity/end date is set to NULL OR some future date like- 2050.12.31

2. If a changed record comes from the source, which is present in the target dimension table: - A new entry is created in the dimension table with Active Flag marked as True/Yes and the record validity/end date is set to NULL OR some future date like- 2050.12.31 - The existing active record for the natural key in the dimension table is modified/updated to mark the Active Flag as False/No and the record validity/end date is set to SYSDATE.

SCD Type 1 and SCD Type 2 implementation might not be achieved directly via Map_CDC_Operation. But consider a case where we have a CDC based source table and our target DW dimension table is a slowly changing of Type1 or Type2, then the question remains do we need to again lookup the target dimension table. Check out the next article on Slowly Changing Dimension Type 1 & Type 2 implementation of CDC based Source table.


Have a question on this subject?

Ask questions to our expert community members and clear your doubts. Asking question or engaging in technical discussion is both easy and rewarding.

Are you on Twitter?

Start following us. This way we will always keep you updated with what's happening in Data Analytics community. We won't spam you. Promise.

  • Working with Data Services Flatfiles

    In this article we will try to cover all the possible scenarios related to flatfiles in SAP Data Services.

  • Learning SAP Data Services - Online Tutorial

    We are pleased to start this new tutorial page for SAP BusinessObjects Data Services (BODS). If you do not know SAP Data Services (BODS) yet but wish to master this ETL tool, you have come to the right place. Even if you are already familiar with...

  • GROUP RANK in Data Services

    In this article, we will learn how to implement RANK and DENSE RANK operations on GROUP in SAP Data Services.

  • Handling XML source files in SAP Data Services

    This article will demonstrate how to read data from XML based source files using SAP Data Services. Here our objective is to load employee and department information respectively from the source XML based file.

  • Data Services Scenario Questions Part 1

    In this tutorial we will discuss some scenario based questions and their solutions using SAP Data Services. This article is meant mainly for Data Services beginners.

  • Table Comparison Transform to Implement Slowly Change Dimension (SCD) in Data Service

    In this tutorial we will learn a new SAP Data services transform, known as Table Comparison Transform and we will see how we may use this transform to implement "slowly changing dimension" (SCD) Type - I. Like before, we have added a video tutorial at the...

  • Web Service Call in SAP Data Services

    This article demonstrates, how to implement external web services call in SAP Data Services.

  • Map_CDC_Operation in Data Services

    Map CDC Operation is used to support relational or mainframe changed-data capture (CDC). This transform reads the source DML changes i.e. INSERT/UPDATE/DELETE and reflects the same in the target system. So the transform maps all the source rows...

  • Data Services Scenario Questions Part 5

    In this tutorial we will discuss some scenario based questions and their solutions using SAP Data Services. This article is meant mainly for Data Services beginners.

  • Data Services Scenario Questions Part 6

    In this tutorial we will discuss some scenario based questions and their solutions using SAP Data Services. This article is meant mainly for Data Services beginners.