I have a very simple outcome in databricks that has 80M lines of data because there is a separate line for each PIN. The data is a listing by PIN with columns "pin", "device category", "device name". a PIN may have 1 or 3 or n devices and there are two possible device categories. I need to convert this from long to wide so I can have a distinct PIN with columns with the devices so I can use this to cluster. All the examples have distinct column names to plug in to the pivot IN statement. This doesn't need to be fancy where the column names change dynamically - they can just be "device1", "device2", "device3", "device4", etc. Any help you can provide would be greatly appreciated!
note: the column names "device1" etc do not need to be "devicen..". For example R just assigns column names. This what I am looking for here. However many columns are needed it just assigns them
Start at this - long
machid | device |
---|---|
123 | A |
456 | B |
123 | C |
Needs to be this - wide
machid | d1 | d2 | d3 | d4 |
---|---|---|---|---|
123 | A | C | ||
456 | B |
This is syntax that worked for me. I tested in Access pass-through query and in SSMS. I tried testing at SQLFiddle.com so I could provide a demo but get error "Must declare the scalar variable "@cols"." However, not seeing STUFF() nor STRING_AGG() nor QUOTENAME() functions in Databricks documentation and not finding equivalents.
And this version also worked:
In either, if I try concatenating any text to Row_Number expression, it fails.