Query runs slow in one dataset, but faster in another bigger dataset; MSSQL server; Different execution plan. Why?

Question

Query runs slow in one dataset, but faster in another bigger dataset; MSSQL server; Different execution plan. Why?

317 views Asked by riceball At 30 August 2017 at 16:09

Here is the query:

select nd1.ansos_id
from nursdate nd1
where nd1.division_id = 2
  and nd1.unit_id = 19
  and nd1.nursdate_start_date =
          (select min(nd2.nursdate_start_date)
           from nursdate nd2
           where nd2.ansos_id = nd1.ansos_id
                 and nd2.nursdate_start_date >= all
                      (select nd3.nursdate_start_date
                       from nursdate nd3
                       where nd3.ansos_id = nd1.ANSOS_ID
                             and nd3.nursdate_start_date <= '2017-08-13 00:00:00'))

Below is true for both datasets

Statistics up to date;
Indices defragmented
General Properties for both datasets are the same: ex: Collation;
Also the 2 datasets have about the same amount of data. Actually the faster one has more data

Now it takes ~8 seconds to run in 1 dataset, but <1 second in another dataset. Here are the execution plan difference between the 2 datasets

"Actual number of Rows" & "Number of executions" in the slower dataset is astronomically higher
The faster one also has an additional node "Index Spool"

Screenshot #1: Same query, slow execution plan in one dataset

Screenshot #2: Same query, speedy execution plan in another dataset

How to address this? What can I do to get it to run fast in the 1st dataset? Thanks!

[EDIT] Slower Execution plan: (note the entire "nursdate" table has only 99K rows) https://www.brentozar.com/pastetheplan/?id=r1ZFFuNt-

Faster Execution plan: (note the entire "nursdate" table has 333K rows. It is somehow faster) https://www.brentozar.com/pastetheplan/?id=rJYMc_EKb

[EDIT] Here is some info on the data volume. The one on "mmt" has less data but runs slower

--mmt cnt: 99347
select count(*)
from mmt_sqa_v60.mmtsqav60.nursdate nd1

--heo cnt: 333275
select count(*)
from heo_sqa_v60_2k12.heosqav602k12.nursdate nd1

--mmt cnt: 2403
select count(*)
from mmt_sqa_v60.mmtsqav60.nursdate nd1
where nd1.division_id = 2
and nd1.unit_id = 19
and nd1.nursdate_start_date <= '2017-08-13 00:00:00'


--heo cnt: 5537
select count(*)
from heo_sqa_v60_2k12.heosqav602k12.nursdate nd1
where nd1.division_id = 1
and nd1.unit_id = 20
and nd1.nursdate_start_date <= '2017-08-13 00:00:00'

Original Q&A

There are 2 answers

Juan Carlos Oropeza On 30 August 2017 at 18:52

Not related to your problem but your query have some issues. I think your query function is find ansos_id with any record before @date correct me if I'm wrong.

The deepest level brings all the dates for ansos_id previous to a @date

select nd3.nursdate_start_date
from nursdate nd3
where nd3.ansos_id = nd1.ANSOS_ID
  and nd3.nursdate_start_date <= '2017-08-13 00:00:00')

then previous query compare against all those dates.

and nd2.nursdate_start_date >= all ( ...)

That is very inefficient, because the only date bigger or equal to all dates is the MAX(date)

Then you (select min(nd2.nursdate_start_date) when as I already explain there is only one date >= to all

I will say your query should be replace with this will return the last record before @date for each ansos_id:

WITH cte as (    
     select nd1.ansos_id, nursdate_start_date,
            RANK() OVER (PARTITION BY ansos_id
                         ORDER BY nursdate_start_date DESC) as rn
     from nursdate nd1
     where nd1.division_id = 2
       and nd1.unit_id = 19
       and nd1.nursdate_start_date <= '2017-08-13 00:00:00'
)
SELECT *
FROM cte
WHERE rn = 1

**riceball** · Accepted Answer · 2017-09-06T20:57:54+00:00

Mystery solved. It is a data issue. Here is where the high number of "execution plan" comes from:

select count(*)
from mmtsqav60.NURSDATE ndArea
left outer join mmtsqav60.NURSDATE ndRelated on ndRelated.ANSOS_ID = 
ndArea.ANSOS_ID
where ndArea.DIVISION_ID=2 and ndArea.UNIT_ID=19;

The result is 4157613 which is exactly how many rows the execution plan said are returned by the index seeks on nd2 and nd3. Because, the above logic is, in essence, what that nursdate query in WSM is asking the database server to do – find all records for an area and then for each such record find all records for that ansos_id. The data is unrealistic data for us, in that one single "ansos_id" has thousands of records. Thanks for all the people who helped here.

TechQA.

Query runs slow in one dataset, but faster in another bigger dataset; MSSQL server; Different execution plan. Why?

There are 2 answers

Related Questions in SQL

Related Questions in SQL-SERVER

Related Questions in PERFORMANCE

Related Questions in SQL-TUNING

Popular Questions

Popular Tags

Trending Questions