Thanks for the help! I'm using MS SQL Server 17 and attempting to group by an ID and find common pairings in a second column based on the shared ID. Most other questions involved finding any combination between multiple columns.
Here is some sample data:
/* Create sample data */
DROP TABLE IF EXISTS example
CREATE TABLE example (
PersonID int,
Place varchar(50)
)
INSERT INTO example (PersonID, Place)
VALUES (1, 'home'), (2, 'work'), (3, 'gym'), (1, 'grocery'), (1, 'home'), (2, 'gym'), (3, 'work'),
(4, 'school'), (2, 'gym'), (3, 'gym'), (4, 'home'), (4, 'school'), (4, 'work'), (5, 'bar')
SELECT * FROM example
Order by PersonID asc
Whenever a PersonID
has more than one row, I'd like to see the common pairings of Place
in the following format (for a Sankey chart).
from | to | count
____________________________
gym | gym | 2
gym | work | 2
school | school | 1
home | home | 1
school | work | 1
grocery | home | 1
A pairing can be for the same place, e.g. PersonID == 1
went 'home'
twice, but I only need pairings of two in the from-to format.
So far I've tried the STRING_AGG function, but I'm having difficulty limiting it to just two-way pairings. Really appreciate the help and I apologize if this is a simple answer that's been solved before.
Attempt:
/* Next, let's try to make our Sankey data (from, to, count) */
DROP TABLE IF EXISTS temp_example
SELECT t.combination, COUNT(*) AS value
INTO temp_example
FROM (SELECT STRING_AGG(Place, ',') within group (order by Place) combination
FROM example
GROUP BY PersonID
HAVING COUNT(*) >= 2
) t
GROUP BY t.combination
ORDER BY value desc
First, you need another column. One which can be used to identify which order the person visited the places. SQL tables are unordered, so the order you insert the data isn't enough. For example, add a timestamp column or something?
Then, use LAG() to find out the place that was previously visited for each row. After which it's a simple GROUP BY.
(Apologies for typos, etc, I'm on my phone)