Map CDC Operation is used to support relational or mainframe changed-data capture (CDC). This transform reads the source DML changes i.e. INSERT/UPDATE/DELETE and reflects the same in the target system. So the transform maps all the source rows logged as insert to an insert, all updates to update and the deletes to delete in the target table.

Whenever a CDC based source table is imported in Data Services, it generates two more additional columns namely DI_OPERATION_TYPE and DI_SEQUENCE_NUMBER.

  • The DI_OPERATION_TYPE column holds the corresponding database DML operation codes for insert,update & delete. Possible OP codes are I - Insert, B - Before Image of update, U - Update, D - Delete
  • The DI_SEQUENCE_NUMBER column holds the information of the order/sequence of the database DML operations. For example if we want to update or delete the row first and then insert it, it doesn't makes sense; Hence the execution order has to be maintained. The order of DML command execution is very important.

Transform Properties

  • Sequencing column should be mapped to the column which specifies an integer used to order table rows or DML execution ordering in source.
  • Row operation column should be mapped to a column which specifies the database DML operation codes.
  • Input already sorted by sequencing column - This transform by default assumes that the input data is already sorted based on the value selected in the Sequencing column box. If we deselect this check box, Data Services will re-sort the input data based on the Sequencing column. Use the re-sort capability of this transform only when necessary as it impacts job performance.

So lets consider a scenario where the Source relational database Oracle/MSSQL is having CDC enabled and a particular table have been selected for CDC; Next we want to reflect these changes to a same table structure in another remote target database. In this case we can use Map_CDC_Operation transform.

Implementation

Let us consider we have a CDC based source table called PUBLISHER and we want to replicate the same to a different database using Map_CDC_Operation. Find below the implementation Job & Dataflow for the same:

Sample Job for Map_CDC_Operation

Sample Dataflow for Map_CDC_Operation

1. The initialization script is used to get the maximum value of DI_SEQUENCE_NUMBER extracted during last execution time. After each execution the value of DI_SEQUENCE_NUMBER is preserved in a Replication control table.

Initialization Script

#MSSQL Replication Control Table

$DI_SEQUENCE_NUMBER = cast( sql( 'DS_TGT_REPO', 
'SELECT ISNULL( MAX( DI_SEQUENCE_NUMBER ), -1 ) FROM DW_LOAD_CONTROL 
WHERE TARGET_NAME = {$TARGET_NAME}' ), 'int' );

2. Here PUBLISHER is our MSSQL CDC based Source table. The Query transform extracts only those tuples where the DI_SEQUENCE_NUMBER is greater than the last maximum DI_SEQUENCE_NUMBER extracted during last run. Also we have selected DI_SEQUENCE_NUMBER column in the ORDER BY tab as sort order Ascending

Query Extraction

3. Next we have the Map_CDC_Operation transform followed by the target table which is a replica of the Source DB table.

Map_CDC_Operation transform

The above functionality can also be achieved with a CASE transform plus three Map_Operation transforms, although the Map_CDC_Operation is just more convenient.

CDC based source Traditional method

NOTE

  • While using a relational or mainframe CDC source table, the DI_OPERATION_TYPE & DI_SEQUENCE_NUMBER columns are automatically selected as the Row operation column and the Sequencing column of the Map_CDC_Operation respectively.
  • While using a relational or mainframe CDC source table, the DI_OPERATION_TYPE & DI_SEQUENCE_NUMBER columns appear in the input schema, and are automatically removed in the output schema. To propagate these columns to the output, create additional columns that map to them prior to coming into the Map_CDC_Operation transform.
  • The Relational CDC sources supported by the transform includes Oracle and SQL Server.
  • This transform is typically the last object before the target in a data flow because it produces INPUT, UPDATE and DELETE operation codes. Data Services produces a warning if other objects are used in between Map_CDC_Operation & Target.

USE CASE

A Map_CDC_Operation can be used to replicate source CDC table changes to another table of similar structure in a remote database or staging area of a data warehouse. Also we can use this transform to replicate/sync a SAP HANA database target table from/with a CDC enabled Source table.

In this context, the source based CDC tables can be optimally used to load slowly changing dimension tables of Data warehouse. In this context it is relevant to highlight that, since the DML operations codes/indicators are coming along with the source records, it might not at all be required to lookup the target dimension table anymore during dimension loading.

What happens in Slowly Changing Dimension Type1

Database Change data capture, records insert, update, and delete activity that is applied to a table. So based on the DI_OPERATION_TYPE & DI_SEQUENCE_NUMBER, we will INSERT or UPDATE records accordingly to target dimension table of SCD TYPE 1. Here we may filter out the records with DI_OPERATION_TYPE as 'D' i.e. delete at the source, because we usually don't delete records from dimension table.

What happens in Slowly Changing Dimension Type2

1. If a changed record comes from the source, which is not present in the target dimension table:

- A new entry is created in the dimension table with Active Flag marked as True/Yes and the record validity/end date is set to NULL OR some future date like- 2050.12.31

2. If a changed record comes from the source, which is present in the target dimension table: - A new entry is created in the dimension table with Active Flag marked as True/Yes and the record validity/end date is set to NULL OR some future date like- 2050.12.31 - The existing active record for the natural key in the dimension table is modified/updated to mark the Active Flag as False/No and the record validity/end date is set to SYSDATE.

SCD Type 1 and SCD Type 2 implementation might not be achieved directly via Map_CDC_Operation. But consider a case where we have a CDC based source table and our target DW dimension table is a slowly changing of Type1 or Type2, then the question remains do we need to again lookup the target dimension table. Check out the next article on Slowly Changing Dimension Type 1 & Type 2 implementation of CDC based Source table.


Have a question on this subject?

Ask questions to our expert community members and clear your doubts. Asking question or engaging in technical discussion is both easy and rewarding.

Are you on Twitter?

Start following us. This way we will always keep you updated with what's happening in Data Analytics community. We won't spam you. Promise.

  • Data Services Scenario Questions Part 3

    In this tutorial we will discuss some scenario based questions and their solutions using SAP Data Services. This article is meant mainly for Data Services beginners.

  • How to use Data Services Pivot Transformation

    In this article, we will learn how to use SAP Data Services Pivot Transform. The Pivot transformation allows us to change how the relationship between rows is displayed. For each value in each pivot column, Data Services produces a row in the...

  • Data Services Scenario Questions Part 1

    In this tutorial we will discuss some scenario based questions and their solutions using SAP Data Services. This article is meant mainly for Data Services beginners.

  • Data Services Flatfiles Tips

    Often we come across scenarios where we have the flat file definition in an excel sheet and we need to create corresponding File Format in SAP Data Services. Alternatively we import file format definition from a Sample Source file.

  • SAP Data Services Analytic Functions

    In this tutorial we will learn how to implement Cumulative Sum in SAP Data Services (BODS). Since there is no available in-built Analytic Functions in SAP Data Services, we will achieve the end result by exploiting some in-built Data Services...

  • Using MERGE Transform in SAP Data Services

    This tutorial describes how to use MERGE transform in SAP Data Services (formerly known as BusinessObjects Data Services). A short video is also provided to provide hands-on.

  • Map_CDC_Operation in Data Services

    Map CDC Operation is used to support relational or mainframe changed-data capture (CDC). This transform reads the source DML changes i.e. INSERT/UPDATE/DELETE and reflects the same in the target system. So the transform maps all the source rows...

  • Fools Guide to BODS - Registering Repository to CMC

    In our earlier article, we have learnt how to create a repository in BODS. Once that part is done, The final process before launching the BODS application is registering the repository with...

  • How to use SAP Data Services Case Transform

    In this tutorial, we will learn how to use SAP data services Case Transform. Case transform provide us a means to implement "If... Then... Else" logic in data services. In the end a short video is provided with this tutorial to give you a hands-on...

  • How to handle multiple input file in SAP Data Services

    In this tutorial, we will learn how we can use a single data flow to process multiple input files of same structure using SAP Data Services. Like before, we have included one hands-on video with the tutorial to help you see the technique in action