SQL Server query is very slow when number of items inside IN clause more than 4

1.5k views Asked by At

I have some complicated query with joins of many tables. It is difficult to put real query, because of complexity.

It is something like

select t1.id, t2.id, t1.name, t2.name 
from table1 t1, table2 t2
left join table3 t3 ON t2.id = t3.id
where t2.id = t1.ref_id
  and t1.ref_id IN ('id1', 'id2', 'id3', 'id4', 'id5', ...)

I found, that if I have inside IN clause only 4 or less values like this t1.ref_id IN ('id1', 'id2', 'id3', 'id4') it works very fast (16 ms). If I just add one id and make it 5 like this t1.ref_id IN ('id1', 'id2', 'id3', 'id4', 'id5') execution time increases to 40 times and becomes 600 ms.

I got it on SQL Server 2014.

It looks like there is some parameter, which controls this behavior. I tried this query in another SQL server (SQL Server 2008) and I could not find any limit.

My question: Is there any parameter, which controls this kind of behavior? or how to increase this strange limit to 50 for example.

I just want to increase it up to 30-50 instead of 4. Of course I do not want to create IN clause with hundreds and thousands of values.

UPDATE1

Sorry, I forgot to put t3.name to select, otherwise it looks like t3 I do not need:

select t1.id, t2.id, t1.name, t2.name, t3.name
from table1 t1, table2 t2
left join table3 t3 ON t2.id = t3.id
where t2.id = t1.ref_id
  and t1.ref_id IN ('id1', 'id2', 'id3', 'id4', 'id5', ...)

UPDATE2

Looks like I found a reason. The issue was not about number of items inside IN. Later I reproduced this issue with less than 4 ids (even with 1). It happens, because some of ids was not presented in t1.ref_id. when there were the ids, which do not exist in t1.ref_id, when it was fast, when I added id, which does exist in t1.ref_id, when it becomes slow. In my previous example id1 - id4 was not presented in t1.ref_id and id5 was presented. This is why when I add id5 it becomes slow. It becomes slow even if I just put only 1 id (id5) inside IN clause. Finally index on t1.ref_id solved the problem. There was no magic around 4 or 5 ids. It is just a coincidence in my specific example.

1

There are 1 answers

4
Gordon Linoff On BEST ANSWER

First, fix the query. Simple rule: Never use commas in the FROM clause.

select t1.id, t2.id, t1.name, t2.name 
from table1 t1 join
     table2 t2
     on t2.id = t1.ref_id left join
     table3 t3 
     on t2.id = t3.id
where t1.ref_id in ('id1', 'id2', 'id3', 'id4', 'id5', ...);

Based on the query, you have no need for table3 -- unless you care about duplicate rows. I would remove it.

Then, you need to consider indexes. I would suggest table1(ref_id, id, name) and table2(id, name).

Also, if ref_id is really a number, then don't put single quotes around the values in the list. Mixing strings and numbers can confuse the optimizer.