INNER JOIN on Linked Server Table much slower than Sub-Query

10.8k views Asked by At

I came across this very odd situation, and i thought i would throw it up to the crowd to find out the WHY.

I have a query that was joining a table on a linked server:

select a.*, b.phone
from table_a a, 
join remote.table_b b on b.id = a.id
 (lots of data on A, but very few on B)

this query was talking forever (never even found out the actual run time), and that is when I noticed B had no index, so I added it, but that didn't fix the issue. Finally, out of desperation I tried:

select a.*, b.phone
from table_a a, 
join (select id, phone from remote.B) as b on b.id = a.id

This version of the query, in my mind as least, should have the same results, but lo and behold, its responding immediately!

Any ideas why one would hang and the other process quickly? And yes, I did wait to make sure the index had been built before running both.

4

There are 4 answers

5
devarc On BEST ANSWER

It's because sometimes(very often) execution plans automatically generated by sql server engine are not as good and obvious as we want to. You can look at execution plan in both situations. I suggest use hint in first query, something like that: INNER MERGE JOIN.

Here is some more information about that:

http://msdn.microsoft.com/en-us/library/ms181714.aspx

6
Dan Roberts On

Remote table as in not on that server? Is it possible that the join is actually making multiple calls out to the remote table while the subquery is making a single request for a copy of the table data, thus resulting in less time waiting on network?

5
Oleg Dok On

For linked servers 2nd variant prefetches all the data locally and do the join, since 1st variant may do inner loop join roundtrip to linked server for every row in A

0
Adrian Matteo On

I'm just going to have a guess here. When you access remote.b is it a table on another server?

If it is, the reason the second query is faster is because, you do one query to the other server and get all the fields you need from b, before processing the data. In the first query you are processing data and at the same time you are making several requests to the other server.

Hope this help you.