SQL Server Using TableDiff on large tables

1.1k views Asked by At

We have a process which uses uses SQL Server's amazing tableDiff via:

Microsoft SQL Server\100\COM\Tablediff.exe

It's SQL Server 2008 R2. It connects from one instance to another identical instance. It works very well!

I have a situation where a table which now has 10767594 records is taking 2.5 hours to complete, it only has one table in the job. How can I improve this?

The process is triggered by a Windows Scheduled Task, this calls a .bat file, the .bat file contains the recommended code which has no issue. We have a couple of these in place and have had for some time. It's just the one job that deals with the big table from instance to instance that is taking too long.

I have realised that the source table does have an index but the destination table does not. I will put an index on this table, what else can I do?

Does table diff run better with indexes?

Is there a ways to use table diff more effectively?

E.g. if I capture the lastProcessedID can I run tableDiff next time for all records where id > lastProcessedID?

Any advice would be great. Thank you in advance

EDITED:

MY SOLUTION - This was a very very big surprise. As I mentioned above, the 10 million+ record table which was identical on the source and destination except for 2 indexes (on the source). After waiting for out of hours since this is an internal production server I applied the indexes to the source. Now I run the tableDiff job which has not been changed at all and it completes in under 2 minutes. 2.5 hours to 2 mins!

I have accepted the answer below because it very very helpful. I did go down the Merge Replication path however after setting up replication and publishing I found out that the production instance was not able to be a subscriber due to the replication not be ticked on install. As Jason says its a reasonable amount of research, learning and setting up. Since I am not a DBA and had not looked at this before it was a worth while experience.

1

There are 1 answers

4
Jason W On BEST ANSWER

The performance issue is because the remote queries pull every record from each place to do the comparison to generate the output. Indexes can help slightly to make the pull a little faster from each location, but it's not likely to be significant.

An incremental approach is definitely better. I don't believe tablediff directly supports comparing 2 queries. If it did, you could do something like EXCEPT or INTERSECT to do the comparisons. If you're trying to keep these databases in sync, why not consider other solutions, like log shipping, mirroring, SSIS, replication, clustering, etc.