DataFactory Remove Line Carry from Source dataset in Copy Data Task

56 views Asked by At

I am using the Parameter Driven approach with a Copy Data task in DataFactory.

Using a list that comes from a Azure SQL Database, that contains almost 100 tables from different source, I loop through this list and call a Copy Data task that will copy data from a source to a destination. The source being a collection of Azure and OnPrem SQL Server Databases, the destination being a Azure Storage Account Blob Storage (data is stored as CSV files).

The reason for this so I don't have to build 100+ different pipelines to ingest data into the blob storage.

Now I have a scenario that is creating some havoc.

A few tables have fields that contain data that has linefeeds in them (multiline data) which in the CSV files created on the Blob Storage now also have these line feeds in them. When I read the data from BlobStorage these line feeds justify a new line when actually shouldn't. enter image description here

Is there a way, considering my dynamic approach to ingesting data using Parameter Driven Copy Data Task, that I can prevent this?

It might also be noted that I use a Query (built by my dynamic list) instead of Table to obtain the source data. Perhaps I can format the data in SELECT query.

enter image description here

1

There are 1 answers

0
Aswin On

In order to do these changes dynamically, you can use dataflow activity instead of copy activity.

  • In data flow, take the derived column transformation. Click on +Add and then click +Add column pattern.
  • For every column to replace the CRLF you can give the expression as in below image. derive(each(match(true()), $$ = replace($$, char(13) + char(10),''))) ~> derivedColumn1

enter image description here

If column has multiline data, this replaces them to single line.

I use a Query (built by my dynamic list) instead of Table to obtain the source data. Perhaps I can format the data in SELECT query.

Since you are using query with the dynamic list, you can use SQL itself to format this. Syntax to replace CRLF in SQL is same as in dataflow.

Syntax:

REPLACE(@string, CHAR(13) + CHAR(10), '')