Table Comparison Transform to Implement Slowly Change Dimension (SCD) in Data Service
In this tutorial we will learn a new SAP Data services transform, known as Table Comparison Transform and we will see how we may use this transform to implement "slowly changing dimension" (SCD) Type - I. Like before, we have added a video tutorial at the end to perform a little hands-on.
This article is part of our comprehensive data services tutorial Learning SAP Data Services - Online Tutorial, you may want to check that tutorial first if you have not already done so.
Table Comparison Transform
This is one of the SAP BusinessObjects Data Services native transform (For a full list of all transforms, you can see details of SAP Data services transforms) that helps to perform a fundamental ETL operation - comparing two data sets.
Comparing two data sets, this transform can generate the difference between them as a resultant dataset with each row of the result flagged as INSERT, UPDATE, or DELETE.
While loading data to a target table, this transform can be used to ensure rows are not duplicated in a target table and hence is very helpful to load a dimension table.
Implementing SCD Type1 dimension loading
In the video below we have demonstrated how to implement SCD - I. We strongly encourage you to watch the video along with reading this text. The basic logic for implementing SCD-I is, we will compare each record coming from source to the records in target and if the record exists in target, we will update the target record with the values from the source record. However, if the record does not exist, we will insert the record to target.
"Table Comparison Transform" help to compare the source records with target. You have to provide the source data (input schema) and the columns based on which you want to perform the comparison. Generally we specify the primary key column (or all the columns forming the composite primary key) so that it can perform the comparison based on the primary key column. Further to this, we can also specify "Compare Columns" - which is a list of columns that we intend track the changes of. As an example, let's say we do <u>not</u> want to update a customer's record even if a certain column has changed in the source and only want to update the target record if some other columns (e.g. column X, Y and Z) have changed. In that case, we have to put column X, Y and Z in "Compare Columns" list.
Note on the Comparison Method
There are three methods for accessing the comparison table namely Row-by-row select, Cached comparison table and Sorted input. Below is the brief on when to select which option.
- Row-by-row select option is best if the target table is large compared to the number of rows the transform will receive as input. In this case for every input row the transform fires a SQL to lookup the target table.
- Cached comparison table option is best when we are comparing the entire target table. DS uses pageable cache as the default. If the table fits in the available memory, we can change the Cache type property of the dataflow to In-Memory.
- Sorted input option is best when the input data is pre sorted based on the primary key columns. DS reads the comparison table in the order of the primary key columns using sequential read only once. NOTE: The order of the input data set must exactly match the order of all primary key columns in the Table_Comparison transform.