SSIS- Derived Column Transform
This is a Fast Forward tutorial on Derived Column Transformation in Microsoft SQL Server Integration Services.
Derived Column
The Derived Column is very flexible and powerful Transformation used to create new column values by applying expressions to input columns of the transformation. An expression can contain any combination of variables, functions, operators, and columns from the transformation input. The result can be added as a new column or inserted into an existing column as a replacement value. The Derived Column transformation can define multiple derived columns, and any variable or input columns can appear in multiple expressions.
Implementation
Lets go straight to implementation. In the previous article we have learnt the basics related to SSIS Project, Packages, Connections, Control Flow and Data Flow. So lets design the Control Flow as below. Here we have taken a Execute SQL Task, to delete data from the Staging target table before the Data Flow Task. The Data Flow Task is responsible for the data movement logic.
Double click the Execute SQL Task to open the Editor. Next we set the Name as Execute_SQL_Task_DELETE_STAGE_TBL. Next we select the Connection Type followed by the Connection Object. Next we select the SQLSourceType as Direct Input. Next we put the desired SQLStatement that we want to execute prior to the Data Flow run. By default, the ByPassPrepare is set to True. In this case we can optionally set it to false as we are not using parameterized query. Find the screen shot below:
Below is the the SQL statement that will be executed prior to the Data Flow. It will delete the existing records in the target staging table.
Now let us define the Data Flow Task to specify the data movement logic.
Here we use a OLE DB Source, a Derived Column transformation and finally a OLE DB Destination.
Double-click to open the OLEDB Source Editor. Next we select the OLEDB connection object name from the OLEDB connection manager browser. Next select the database Table or View.
Here we select all the columns of the EMP table as our source.
Lets keep the Error Outputs as default.
Next let's have a look at the Advanced Editor. Right-click the OLEDB Source and select Show Advanced Editor.
Below OLEDB Source Advanced Editor- Component Properties.
Below OLEDB Source Advanced Editor- Column Mappings
Below OLEDB Source Advanced Editor- Input & Output Properties
Next we Double-click the Derived Column transformation to open the Editor. The Editor layout shows three panes- Left we have the input Columns to the transformation along with the set of System and User defined variables. On the right we have the set of Functions and operators to be used for column manipulation. Below is the area where we define the new sets of columns as Output. Here we define the Derived Column Name, Data Type, Length, Precision, Scale and the corresponding Expression.
let us define a new column as load_date to be populated with system date, using function GETDATE(). Next we define output column commission as- if input commission is NULL then we evaluate it as 0 else we populate the source commission value.
Next right-click the Derived Column transform and select Show Advanced Editor.
Below Derived Column Advanced Editor- Input Columns. Here we can rename a column name using Output Alias. Here we select the Usage Type for column comm as READONLY as this column is used to evaluate commission and we don't want this column as output; instead we need the derived column as the output.
Below Derived Column Advanced Editor- Input & Output Properties.
Finally open the OLEDB Destination Editor. Select the connection object and the target table.
Next we go to the Mappings tab and define the target column mapping.
Lets leave the OLEDB Target- Error Output as default.
Next right-click the OLEDB Destination and select Advanced Editor.
Below OLEDB Target Advanced Editor- Component Properties
Below OLEDB Target Advanced Editor- Column Mappings
Below OLEDB Target Advanced Editor- Input & Output Properties
Finally we save the Data Flow, Control Flow and Package. Next Execute the Package.
Use Derived Column transform to evaluate and manipulate input column values and generated desired output columns as a part of data movement logic.