What I have:
Two tables A and B.
A: Col1 Col2 Col3 Col1_Transformed Col2_Transformed
B: Col1 Col2 Col3
They have exact the same columns except some modified data, neither of them have primary key, unique field or whatever (it's extreme case that I need to handle).
Table A contains only part of data from table B. Lets say, I read table B in batches for 10000 rows, then I modify data and write it in to table A (*_Transform columns).
What I need:
I need to update table B using data from table A. It will be easy having primary key or some identity column:
UPDATE target
SET target.Col1 = source.__Col1
FROM B target
join A as source
ON target.Id = source.Id
But I don't have any.
Next natural guess is to relay on natural key, formed out of Col1 Col2 Col3.
UPDATE target
SET target.Col1 = source.__Col1
FROM B target
join A as source
ON target.Col1 = source.Col1 AND
target.Col2 = source.Col2 AND
target.Col3 = source.Col3
This thing will not work, the reason is in possible NULL value comparation. If target.Col1 or source.Col1 is NULL, than whole record is omitted from update.
Guess #3 (using row number):
I would like to add row_number to table A, it will be my key.
A: row_numb Col1 Col2 Col3 Col1_Transformed Col2_Transformed
And than just join table B:
UPDATE target
SET target.Col1 = source.__Col1
FROM B target
join A as source
ON ROW_NUMBER() OVER(ORDER BY target.Col1,target.Col2,target.Col3) = source.ROW_NUMB
Bur row_nubmer cannot be used in join clause.
Any ideas how to do it?
Seems like you want
NULL
to match for the joins, you could do that :