Azure Data Factory -- I want to select records from the source based on subselect from the destination

486 views Asked by At

I would like to select records to copy (using the Copy Data activity) based on a subselect in the following form --

select from source_table where timestamp > (select max(timestamp) from destination_table)

This seems like about as elementary a piece of code as one could ask for, but I keep getting an error when I try to figure out how to approach the problem. It indicates it can't see the destination table when I put that select into the source tab. I get a similar issue when I put a pre-copy script in the sink. I have both source and destination tables linked in Azure.

I'm very new to this. Can someone tell me how to do this?

I'm having a devil of a time writing a google search to ask this question, and in the results to date Microsoft wants me to use a separate table to maintain the most recent loaded timestamp prior to the run, like one would do in FoxPro back in the 90s. This is a big honkin' table, and I only want to update the last two minutes' worth of additions. I'm just looking to say "update new stuff since the last update".

My best guess at what should work --

select * from FastQ_Branch_Appointment_Time_At_Time_Customer_Queued where time_queued > (select max(time_queued) from azure_FastQ_Branch_Appointment_Time_At_Time_Customer_Queued)

I'll mention that both source and destination tables have the same names, but the link to each is different. Both are linked data sources, one local in SQL*Server the other an Azure SQL table. Both are tested and work in the Copy Data activity when I'm not doing this select thing.

1

There are 1 answers

0
Aswin On

To solve this, you can take the lookup activity in ADF to retrieve the maximum value of a column named time_queued from a table named azure_FastQ_Branch_Appointment_Time_At_Time_Customer_Queued of the destination dataset.

enter image description here

Query in Lookup activity dataset:

select max(time_queued) as max_timestamp from azure_FastQ_Branch_Appointment_Time_At_Time_Customer_Queued
  • Then take the copy activity and in source dataset give the source query as,
select * from FastQ_Branch_Appointment_Time_At_Time_Customer_Queued where time_queued > '@{activity('Lookup1').output.firstRow.max_timestamp}'

This SQL query selects all rows from the table where the time_queued value is greater than the maximum value retrieved by the Lookup activity.

enter image description here