Informatica

Loading Flatfiles delimited by comma and double quotes

Saurav Mitra

In this article let us take up a very trivial but an important aspect that we as DW developers usual face. This is related to loading flat file sources. Whenever we have flat file sources we usual ask source systems for a specific type of field delimiters.

Now suppose we have requested our source system for a comma separated flat file which will hold all Employee Information of an organization. Say we ask for a very simple file with five columns –Empno, Ename, Job, Sal, Address. Let us name the file as Emp_Src.txt. Below are the sample data in the file

EMPNO,ENAME,JOB,SAL,ADDRESS			
7900,JAMES,CLERK,950,CHOA CHU KANG
8001,SANJAY,ANALYST,33000,BUKIT BATOK
7654,MARTIN,SALESMAN,1375,RUSSEL ST
7566,JONES,MANAGER,1050,YEW TEE
7844,TURNER,SALESMAN,1650,BISHAN
7698,BLAKE,Manager,3740,JURONG
7788,SC,ANALYST,3300,LAKESIDE
7370,SMITH,CLERK,2500,COMMONWEALTH
7402,ADAM,SERVICE,5500,ADAM ST

Very simple, right? Yes, it is. Still we will once again look into the nitty-gritty of importing a flat file in Informatica. The below diagram specifies the different steps while importing a flat file, in our case a simple comma separated file.

On successful execution of the Simple File to Table map, below is the dataset in the DB table:

Now suppose in the same flat file we have an address which has a comma in its text. Then what happens? Let us execute the same mapping with the below data.

The address for EmpNo 7566 has a comma in its text. His address is now Block 35, Yew Tee

EMPNO,ENAME,JOB,SAL,ADDRESS			
7900,JAMES,CLERK,950,CHOA CHU KANG
8001,SANJAY,ANALYST,33000,BUKIT BATOK
7654,MARTIN,SALESMAN,1375,RUSSEL ST
7566,JONES,MANAGER,1050,BLOCK 35,YEW TEE
7844,TURNER,SALESMAN,1650,BISHAN
7698,BLAKE,Manager,3740,JURONG
7788,SC,ANALYST,3300,LAKESIDE
7370,SMITH,CLERK,2500,COMMONWEALTH
7402,ADAM,SERVICE,5500,ADAM ST

On successful execution, let us check the data in the table.

Flatfile Data Loaded in Database

Oops address got truncated for EmpNo 7566. How do we handle such case? In such scenarios we have to request Source Systems to enclose such text which has the delimiters as a part of the text with some other identifiers. Suppose in our case we request source to send us files with such texts within double quotes. Now the file would look like:

EMPNO,ENAME,JOB,SAL,ADDRESS			
7900,JAMES,CLERK,950,CHOA CHU KANG
8001,SANJAY,ANALYST,33000,BUKIT BATOK
7654,MARTIN,SALESMAN,1375,RUSSEL ST
7566,JONES,MANAGER,1050,"BLOCK 35,YEW TEE"
7844,TURNER,SALESMAN,1650,BISHAN
7698,BLAKE,Manager,3740,JURONG
7788,SC,ANALYST,3300,LAKESIDE
7370,SMITH,CLERK,2500,COMMONWEALTH
7402,ADAM,SERVICE,5500,ADAM ST

Now we have to make a small change in the source definition. Double click to edit on the Source Definition of the flat file and click on the Advanced button

Advanced Option for Flatfile Loading

And then check the type of Optional Quotes that our source system has agreed to send. In our case we would need to check the Double check box as shown below:

Double Quote option

Save and validate corresponding Mapping and Sessions and then re-execute it. Now let us check the result:

Double Quote showing correct data

We can also make the same change when defining the source definition at the very beginning while importing. The specification has to be given as Text qualifier in Step 2 as shown below:

text qualifier