Informatica - Handling duplicate rows while reading from Source

Problem Summary

Duplicate rows in the source, encountering primary key violation errors on the target table is a common issue encountered when running PowerCenter sessions.
You can use a Lookup transformation to find duplicate data in a target based on a key field (or a composite key). This works when comparing source rows to rows already existing in the target table.
But this technique does not entirely work when the source contains duplicate rows which do not exist in the target already.

You may attempt to use an uncached Lookup against the target and set the target commit interval to one so that the target table is queried for each row. Because of the way in which the PowerCenter Server processes rows (as described in LookUp Strategies Section ), this approach does not work. Even if this method worked, it would be very inefficient and performance would be severely affected.


Lookup Strategies 

If you have duplicate rows in your source data, using a Lookup transformation will not reliably detect them.
A cached Lookup transformation caches the target rows at the beginning of the session and can detect duplicate rows that are already in the target, but not new duplicate rows being loaded in the same session. Turning off caching will not reliably catch duplicates. It ensures the target is re-queried for each row, but uncommitted rows temporarily stored buffers will not be accessible by the Lookup. Setting the commit point to one also will not work reliably. A database commit operation takes some milliseconds to complete, and the transformation engine does not pause - it keeps processing and transforming rows and buffering them in memory. So by the time the commit is complete, 100 rows might have been processed. If the 101st row is a duplicate the Lookup will find it, but not if there was a duplicate in the first 100 rows. A dynamic lookup can be used to overcome the above limitation.


The following examples demonstrate some of the most efficient ways to deal with duplicate source data.

Rank() Function is Source Qualifier SQL
Eliminate duplicate at the time when you are reading it from source.
or let's say you want to load the duplicate entries as an older version then you can pull all the data with respective ranks and use router transformation to route rank=1 and apply your logic accordingly and rank!=1 just load it as an older version of the records.

Aggregator Mapping - essentially same as above but within Informatica.

When to use Any time an Aggregator transformation has no major impact on performance or the source is relational. Or source is file and output row order does not matter.

Pros: Ease of implementation. Works with all types of sources.

Cons: Limits in the choice of output row. Aggregator cache can slow performance of large loads. Duplicates are silently removed and therefore no auditing.






Informatica - Cannot start a connector in an input only port

Edit the input Transformation ->Port Tab -> Enable all the Output Ports which you want to move to next transfromation.

Informatica - Unspecified error - Writer execution failed - Row # [*] in bad file

Severity: ERROR
Timestamp:
Node: node01
Thread: WRITER_1_*_1
Process ID: 7572
Message Code: WRT_8229
Message: Database errors occurred:
Microsoft SQL Server Native Client 11.0: Unspecified error


Database driver error...
Function Name : Execute Multiple
SQL Stmt : INSERT INTO dbo.table(columns)  VALUES ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)


--------

Severity: ERROR
Timestamp:
Node: node01
Thread: WRITER_1_*_1
Process ID: 7572
Message Code: WRT_8425
Message: ERROR: Writer execution failed.

-------
Severity: ERROR
Timestamp: 
Node: node01
Thread: WRITER_1_*_1
Process ID: 7572
Message Code: WRT_8114
Message: 
Row # [1] in bad file


Could be a data issue, follow the link below to see how to enable the debug in Informatica and to see what and why were records rejected.



Informatica - The metadata could not be determined because every code path results in an error

Severity: ERROR
Timestamp: time
Node:NodeName
Thread: WRITER_1_*_1
Process ID: 5348
Message Code: WRT_8229
Message: Database errors occurred: Microsoft SQL Server Native Client 11.0: Unspecified error

Microsoft SQL Server Native Client 11.0: The metadata could not be determined because every code path results in an error; see previous errors for some of these.
SQL State: 42000 Native Error: 11529
State: 1 Severity: 16
SQL Server Message: The metadata could not be determined because every code path results in an error; see previous errors for some of these.

Microsoft SQL Server Native Client 11.0: Invalid object name 'YourTargetTableName'.
SQL State: 42S02 Native Error: 208
State: 1 Severity: 16
SQL Server Message: Invalid object name 'YourTargetTableName.


Database driver error...
Function Name : Execute Multiple
SQL Stmt : INSERT INTO **** VALUES (****)

************************************************
Informatica is not able to find the object/Table named 'YourTargetTableName'.
Ensure that you are using the correct table name, and also check your database/Schema name as well.
In my case, it was looking for a table in a different schema.

“Error: Folder [ID=xx] not found” when running the pmcmd StartWorkflow command

Problem Description

When pmcmd StartWorkflow command is run after importing and replacing workflows into existing repository folders, the following error appears:

“Error: Folder [ID=xx] not found”.

Cause

The error occurs because of repository agent connection caching. Upon running a workflow, the Integration Service caches the folder ID, workflow and other metadata information so that subsequent runs start fast as time taken for metadata fetch would be lesser. If the said workflow is replaced, it gets saved with a new workflow ID tied to folder ID. When the workflow is started using pmcmd, the saved metadata is reused as the workflow name is same but leading to a different workflow ID, thus leading to the error.



This error can also occur if the folder of the workflow was renamed and a new folder with the original name was created.

Solution

This is a known issue and a Change Request (CR 223242) has been submitted to be addressed in a future release.

Workaround


To resolve this issue, use one of the following workarounds:


  • Specify the folder name to the pmcmd startworkflow command using –f switch. –f is not mandatory but helps in this case to avoid the error.



  • If the -f option is already used, check if the workflow name is unique and try without the -f option.



  • Restart the Integration service. Restarting the Integration Service releases the cached metadata.



  • Import the workflows into a new folder and start the workflow from that folder.


More Information
Note


Alternatively, disabling repository agent caching option should serve as another possible workaround. Should you wish to adopt this workaround

Informatica - How to Create a file with Command Task

How to Create a file with Command Task

echo .>C:\FilePath\FileName.txt


Informatica - How to start a workflow with Command Task


How to start a workflow with Command Task

pmcmd  startworkflow  -s  ISserviceName   -d DomainName   -u userName   -p  password               -f KCMS    -wait   WorkflowNameToBeStarted


Using ENV variables.

pmcmd  startworkflow  -sv %INFA_IS%   -d %INFA_DOMAIN%   -uv INFA_USER   -pv  INFA_PWD    -f  WorkflowFolderLocation    -wait   WorkflowNameToBeStarted


From Documentation

Starts a workflow.
The StartWorkflow command uses the following syntax in the command line mode:

pmcmd StartWorkflow

<<-service|-sv> service [<-domain|-d> domain] [<-timeout|-t> timeout]>

<<-user|-u> username|<-uservar|-uv> userEnvVar>

<<-password|-p> password|<-passwordvar|-pv> passwordEnvVar>

[<<-usersecuritydomain|-usd> usersecuritydomain|<-usersecuritydomainvar|-usdv> userSecuritydomainEnvVar>]

[<-folder|-f> folder]

[<-startfrom> taskInstancePath] 

[<-recovery|-norecovery>]

[<-paramfile> paramfile]

[<-localparamfile|-lpf> localparamfile]

[<-osprofile|-o> OSUser] 

[-wait|-nowait]

[<-runinsname|-rin> runInsName] 

Limitations of Pushdown Optimization - PDO

Limitations of Pushdown optimization. Source and Target and Lookups must be in same database server for full Pushdown. Expressions w...