Need to split column into rows and columns

163 views Asked by At

I have a table like this:

ID  cst
1   string1;3;string2;string3;34;string4;-1;string5;string6;12;string7;5;string8,string9, 65
2   string10;-3;string11;string12;56;string13;6;string14;string15;9
etc.

Now I want to split the cst column into 5 columns and multiple rows. So like this:

ID  C1       C2  C3        C4        C5
1   string1   3  string2   string3   34
1   string4  -1  string5   string6   12
1   string7   5  string8   string9   65
2   string10 -3  string11  string12  56
2   string13  6  string14  string15   9
etc.

How to accomplish this? I am on SQL-server 2017, so I can use the string_split function. The problem with this function is that it produces only one output column...

Preferably I would like yo create an UDF that outputs a table. The function would use these input parameters: the string, the separator character, the number of columns. So the function can be used dynamically with a varying number of columns.

ps. the strings can be of variable length of course.

2

There are 2 answers

1
Shnugo On BEST ANSWER

Try it along this:

Hint: There are some "normal" commas in your sample data. I suspected these as wrong and used semicolons. If this is wrong, you might use a general REPLACE() to use ";" instead of ",".

Create a declared table to simulate your issue

DECLARE @tbl TABLE(ID INT, cst VARCHAR(1000));
INSERT INTO @tbl(ID,cst) 
VALUES(1,'string1;3;string2;string3;34;string4;-1;string5;string6;12;string7;5;string8;string9; 65')
     ,(2,'string10;-3;string11;string12;56;string13;6;string14;string15;9');

--The query (for almost any version of SQL-Server, find v2017+ as UPDATE below)

WITH cte AS
(
    SELECT t.ID
          ,B.Nr
          ,A.Casted.value('(/x[sql:column("B.Nr")]/text())[1]','varchar(max)') AS ValueAtPosition
          ,(B.Nr-1) % 5 AS Position
          ,(B.Nr-1)/5 AS GroupingKey
    FROM @tbl t
    CROSS APPLY(SELECT CAST('<x>' + REPLACE(t.cst,';','</x><x>') + '</x>' AS XML)) A(Casted)
    CROSS APPLY(SELECT TOP(A.Casted.value('count(x)','int')) ROW_NUMBER() OVER(ORDER BY(SELECT NULL)) FROM master..spt_values) B(Nr)
)
SELECT ID
      ,GroupingKey
      ,MAX(CASE WHEN Position=0 THEN ValueAtPosition END) AS C1
      ,MAX(CASE WHEN Position=1 THEN ValueAtPosition END) AS C2
      ,MAX(CASE WHEN Position=2 THEN ValueAtPosition END) AS C3
      ,MAX(CASE WHEN Position=3 THEN ValueAtPosition END) AS C4
      ,MAX(CASE WHEN Position=4 THEN ValueAtPosition END) AS C5
FROM cte
GROUP BY ID,GroupingKey
ORDER BY ID,GroupingKey;

The idea in short:

  • we use APPLY to add your string casted to XML to the result set. This will help to split the string ("a;b;c" => <x>a</x><x>b</x><x>c</x>)
  • We use another APPLY to create a tally on the fly with a computed TOP-clause. It will return as many virtual rows as there are elements in the XML
  • We use sql:column() to grab each element's value by its position and some simple maths to create a grouping key and a running number from 0 to 4 and so on.
  • We use GROUP BY together with MAX(CASE...) to place the values in the fitting column (old-fashioned pivot or conditional aggregation).

Hint: If you want this fully generically, with a number of columns not knwon in advance. You cannot use any kind of function or ad-hoc query. You would rather need some kind of dynamic statement creation together with EXEC within a stored procedure. to be honest: This might be a case of XY-problem. Such approaches are the wrong idea - at least in almost all situations I can think of.

UPDATE for SQL-Server 2017+

You are on v2017, this allows for JSON, which is a bit faster in position safe string splitting. Try this:

    SELECT t.ID
          ,A.*
    FROM @tbl t
    CROSS APPLY OPENJSON(CONCAT('["',REPLACE(t.cst,';','","'),'"]')) A

The general idea is the same. We transform a string to a JSON-array ("a,b,c" => ["a","b","c"]) and read it with APPLY OPENJSON(). You can perform the same maths at the "key" column and do the rest as above.

Just because it is ready here, this is the full query for v2017+

WITH cte AS
(
    SELECT t.ID
          ,A.[key]+1 AS Nr
          ,A.[value] AS ValueAtPosition
          ,A.[key] % 5 AS Position
          ,A.[key]/5 AS GroupingKey 
    FROM @tbl t
    CROSS APPLY OPENJSON(CONCAT('["',REPLACE(t.cst,';','","'),'"]')) A
)
SELECT ID
      ,GroupingKey
      ,MAX(CASE WHEN Position=0 THEN ValueAtPosition END) AS C1
      ,MAX(CASE WHEN Position=1 THEN ValueAtPosition END) AS C2
      ,MAX(CASE WHEN Position=2 THEN ValueAtPosition END) AS C3
      ,MAX(CASE WHEN Position=3 THEN ValueAtPosition END) AS C4
      ,MAX(CASE WHEN Position=4 THEN ValueAtPosition END) AS C5
FROM cte
GROUP BY ID,GroupingKey
ORDER BY ID,GroupingKey;
0
Tim Biegeleisen On

The easiest option here honestly might be the following steps:

  1. Write out the current table to a CSV flat file, using semicolon as the separator (which is also the separator for the current cst column
  2. Then load the CSV using SQL Server's bulk loading tool, again with semicolon as the column separator. This will yield a table with 16 columns, ID, and then C1 through and including C15.
  3. Create a new table (ID, C1, C2, C3, C4, C5)

Then populate the above table using:

INSERT INTO newTable (ID, C1, C2, C3, C4, C5)
SELECT ID, C1, C2, C3, C4, C5      FROM loadedTable UNION ALL
SELECT ID, C6, C7, C8, C9, C10     FROM loadedTable UNION ALL
SELECT ID, C11, C12, C13, C14, C15 FROM loadedTable;

While the above suggestion might seem like a lot of work, SQL Server has poor support for regex and complex string splitting operations, especially on earlier versions. Working directly with your current table might be either not possible or more work than the above.