Shuffling data across rows in a table

168 views Asked by At

I want to update a table to have the data shuffled around. Let's say I have a table with 5 columns. column 1 is the primary key so that'll remain unchanged. columns 2 and 3 will be shuffled to different rows. columns 4 and 5 will also be shuffled but as pairs (they must be moved together).

How would I write the script for this? The primary key is a random unique identifier so it doesn't correlate with the row number.

Before:

c1 c2 c3 c4 c5
10 1 2 3 4
20 5 6 7 8
30 9 10 11 12
40 13 14 15 16

After:

c1 c2 c3 c4 c5
10 5 14 11 12
20 9 6 15 16
30 1 2 3 4
40 13 10 7 8

To provide an update on why I am doing this, this isn't a production database or anything. I'm shuffling the data around to anonymize the data for testing purposes, that's about it.

To provide additional details, say this is a people table. I might shuffle the address or phone number to anonymize the person.

2

There are 2 answers

6
Salman A On

If temporary tables are allowed:

  • Create a temporary table that contains row_number based keys
    • One key acts as a secondary key
    • Other keys will consist secondary key values but sorted randomly
select *
     , sk = row_number() over (order by c1)
     , k1 = row_number() over (order by newid())
     , k2 = row_number() over (order by newid())
     , k3 = row_number() over (order by newid())
into #temp
from t
c1 c2 c3 c4 c5 sk k1 k2 k3
10 1 2 3 4 1 4 3 1
20 5 6 7 8 2 3 4 2
30 9 10 11 12 3 1 2 3
40 13 14 15 16 4 2 1 4

You can use an update-with-join query:

update t set
t.c2 = a.c2,
t.c3 = b.c3,
t.c4 = c.c4,
t.c5 = c.c5
from t
join #temp as x on t.c1 = x.c1
join #temp as a on x.sk = a.k1
join #temp as b on x.sk = b.k2
join #temp as c on x.sk = c.k3

DB<>Fiddle

0
Thom A On

You appear to be under the impression that all these values are unrelated; they aren't. 10 is related directly to 1, 2, 3 and 4, and those values to each other as well. If you "shuffle" the value of 10 (in c1) then the values 1, 2, 3 and 4 go with it as they are a single row.

If you need to do this type of thing then firstly I'd suggest that all these values are on different rows and denote what "c" they belong to. So your data would look more like this:

c v
1 10
2 1
3 2
4 3
5 4
1 20
2 5
3 6
4 7
5 8

etc...

If your data looks like that, it's far easier to get the results you want. You can order you data on something "random" and the use conditional aggregation to get the values into their respective columns:

WITH CTE AS(
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY C ORDER BY NEWID()) AS I
    FROM (VALUES(1,10),
                (2,1),
                (3,2),
                (4,3),
                (5,4),
                (1,20),
                (2,5),
                (3,6),
                (4,7),
                (5,8))YT(C,V))
SELECT MAX(CASE C WHEN 1 THEN V END) AS C1,
       MAX(CASE C WHEN 2 THEN V END) AS C2,
       MAX(CASE C WHEN 3 THEN V END) AS C3,
       MAX(CASE C WHEN 4 THEN V END) AS C4,
       MAX(CASE C WHEN 5 THEN V END) AS C5
FROM CTE
GROUP BY I;

If you "can't" get your data in a better format, you could unpivot it before you repivot it:

WITH CTE AS(
    SELECT V.C,
           V.V,
           ROW_NUMBER() OVER (PARTITION BY V.C ORDER BY NEWID()) AS I
    FROM (VALUES(10, 1, 2, 3, 4),
                (20, 5, 6, 7, 8),
                (30, 9,10,11,12),
                (40,13,14,15,16))YT(C1,C2,C3,C4,C5)
         CROSS APPLY (VALUES(1,C1),
                            (2,C2),
                            (3,C3),
                            (4,C4),
                            (5,C5))V(C,V))
SELECT MAX(CASE C WHEN 1 THEN V END) AS C1,
       MAX(CASE C WHEN 2 THEN V END) AS C2,
       MAX(CASE C WHEN 3 THEN V END) AS C3,
       MAX(CASE C WHEN 4 THEN V END) AS C4,
       MAX(CASE C WHEN 5 THEN V END) AS C5
FROM CTE
GROUP BY I;