Situation:
I have two databases DB1 and DB2. DB1 has 9 tables (out of these 9, any of the tables could get updated with only column to track that change i.e TIME_STAMP column).
Now I am trying to bring only the Updated/New records from these 9 tables from DB1 to DB2 (each table in DB1 map to one table in DB2) using Azure Data Factory and then run my final query on the tables in DB2. I have only read access on DB1.
Logic:
- Find the set of the Ids using UNION which got updated/inserted in these 9 tables
- Inner join the result of the UNION query with each of the 9 tables individually
Union Query Example:
SELECT DISTINCT idcolumn
FROM table1
WHERE TIME_STAMP BETWEEN '2020-11-06T13:14:13.807Z' AND '2020-11-07T13:14:13.807Z'
UNION
SELECT DISTINCT idcolumn
FROM table2
WHERE TIME_STAMP BETWEEN '2020-11-06T13:14:13.807Z' AND '2020-11-07T13:14:13.807Z'
UNION
SELECT DISTINCT idcolumn
FROM table3
WHERE TIME_STAMP BETWEEN '2020-11-06T13:14:13.807Z' AND '2020-11-07T13:14:13.807Z'
UNION
SELECT DISTINCT idcolumn
FROM table4
WHERE TIME_STAMP BETWEEN '2020-11-06T13:14:13.807Z' AND '2020-11-07T13:14:13.807Z'
And similarly for 5 other tables.
Problem:
Is there a way in ADF with which I can find the union query result only once and then join the resultant data with all the 9 tables in one go rather than running the union query 9 times with each table in 9 different copy activity?
You can achieve that with Data Flow. For example,
Here's the data flow overview:
Just with Copy active in ADF pipeline, it's impossible. We can not join the A Copy active source to B copy active's source.