convert SQL output in databricks from long to wide

Question

convert SQL output in databricks from long to wide

111 views Asked by DataScienceDave At 10 December 2023 at 15:27

I have a very simple outcome in databricks that has 80M lines of data because there is a separate line for each PIN. The data is a listing by PIN with columns "pin", "device category", "device name". a PIN may have 1 or 3 or n devices and there are two possible device categories. I need to convert this from long to wide so I can have a distinct PIN with columns with the devices so I can use this to cluster. All the examples have distinct column names to plug in to the pivot IN statement. This doesn't need to be fancy where the column names change dynamically - they can just be "device1", "device2", "device3", "device4", etc. Any help you can provide would be greatly appreciated!

note: the column names "device1" etc do not need to be "devicen..". For example R just assigns column names. This what I am looking for here. However many columns are needed it just assigns them

Start at this - long

machid	device
123	A
456	B
123	C

Needs to be this - wide

machid	d1	d2	d3	d4
123	A	C
456	B

Original Q&A

There are 1 answers

**June7** · Answer 1 · 2023-12-11T11:30:58+00:00

This is syntax that worked for me. I tested in Access pass-through query and in SSMS. I tried testing at SQLFiddle.com so I could provide a demo but get error "Must declare the scalar variable "@cols"." However, not seeing STUFF() nor STRING_AGG() nor QUOTENAME() functions in Databricks documentation and not finding equivalents.

SET NoCount ON;

DECLARE @cols AS NVARCHAR(MAX), @sql AS NVARCHAR(MAX);

WITH GS AS(
SELECT machid, device, 
Row_Number() OVER(PARTITION BY machid ORDER BY machid, device) As GrpSeq 
FROM table1)

SELECT @cols = STUFF(
(SELECT ',' + QUOTENAME(G) AS [text()]
FROM (SELECT DISTINCT GrpSeq AS G FROM GS) AS G
FOR XML PATH('')), 1, 1, '');

SELECT @sql = 'SELECT *
       FROM (SELECT machid, device, 
             Row_Number() OVER(PARTITION BY machid ORDER BY machid, device) As GrpSeq FROM table1) AS Q
       PIVOT(Min(device) FOR GrpSeq IN(' + @cols + ')) AS P;';

EXEC sp_executesql @sql;

And this version also worked:

SET NoCount ON;

DECLARE @cols AS NVARCHAR(MAX), @sql AS NVARCHAR(MAX);

WITH GS AS(
SELECT machid, device, QUOTENAME(Row_Number() OVER(PARTITION BY machid ORDER BY machid, device)) As GrpSeq FROM Table1),
DS AS
(SELECT DISTINCT GrpSeq FROM GS),
CS AS
(SELECT STRING_AGG(GrpSeq,',') AS G FROM DS)

SELECT @cols = G FROM CS

SELECT @sql = 'SELECT *
            FROM (SELECT machid, device, Row_Number() OVER(PARTITION BY machid ORDER BY machid, device) As GrpSeq FROM Table1) AS Q
            PIVOT(Min(machid) FOR GrpSeq IN(' + @cols + ')) AS P;';

EXEC sp_executesql @sql;

In either, if I try concatenating any text to Row_Number expression, it fails.

TechQA.

convert SQL output in databricks from long to wide

There are 1 answers

Related Questions in SQL

Related Questions in WIDE-FORMAT-DATA

Popular Questions

Popular Tags

Trending Questions