Update Random Sample in Large Table

659 views Asked by At

Using SQL Server 2012, I have a table with 7 million rows. PK column is a GUID (COMB GUID). I am trying to test the performance of a query and first need to update a random sampling of data, I want to change a column value (not the PK) of 50,000 rows.

Selecting Top 50,000 Order by NEWID() takes way too long, I think SQL Server is scanning the whole table. I cannot seem to get the syntax right for TABLESAMPLE, it returns an empty set.

What is the best way to get this to work?

1

There are 1 answers

0
Aaron Bertrand On BEST ANSWER

And to treat it as an update:

;WITH x AS
(
  SELECT TOP (50000) col 
  FROM dbo.table TABLESAMPLE (50000 ROWS)
)
UPDATE x SET col = 'something else';

But a couple of notes:

  1. You probably won't see a huge performance improvement over ORDER BY NEWID(). On a table with 1MM rows this took over a minute on my machine.
  2. The TOP is there because TABLESAMPLE doesn't guarantee the exact number of rows - it's based on a rough calculation of how many pages might contain 50,000 rows. You may end up with less or more depending on your fillfactor, how many variable-length columns, how many NULL values, etc. The TOP above will help limit it to 50,000 when the estimate leads to a larger number of pages being read, but it won't help if the estimate is under.

There is some discussion of this going on in another question right now.