How azure data factory -Dataflow can create CRM related entities records in single transaction?

372 views Asked by At

I am trying to implement Azure Data factory DATAFLOW to create CRM entity records into multiple entities in single transaction. If any error occurred in the second entity, then first entity record should be rollback. Please share your idea.

I tried with Json file as input with multiple hierarchy, representing multiple CRM entity. I used Data flow source json dataset and 3 CRM sinks. But, i am unable to achieve single transaction when an error occurred.

1

There are 1 answers

0
Aswin On BEST ANSWER

ADF does not support roll back option. You can have any Watermark column or flag in the target table which indicates the records which got inserted during the current pipeline run and delete only those records if any error occurred.

Watermark column is the column which can have the timestamp at which the row got inserted or it can be incrementing key. Before running the pipeline, maximum value of the watermark column is noted. Whenever the pipeline is failed, rows inserted after the maximum watermark value can be deleted.

Instead of deleting all records from current pipeline run, if records which are not copied in some entities only need to be deleted then based on the key field, we can delete the rows. Below is the approach.

  • Source1 and source2 are taken with entity1 and entity2 data respectively.

enter image description here img1: entity1 data

enter image description here

img2: entity2 data

  • Id=6 is not copied to entity2. So, this should be deleted from entity1.
  • Exist transformation is added and left and right stream are given as source1 and 2 respectively. Exists type is doesn't exist. Exists conditions: source1@id = source2@id.

enter image description here img3: exists transformation settings

  • Alter row transformation is added and condition is given as delete if true().

enter image description here img4: Alter Row transformation settings

  • In sink settings, allow delete is selected and key column is selected as id.

enter image description here Img5: Sink settings

enter image description here img6: Sink data preview.

When pipeline with this dataflow is run, all rows which are in entity1 but not in entity2 are deleted.