Combination of Column Values in SQL Based on Common ID

729 views Asked by At

Thanks for the help! I'm using MS SQL Server 17 and attempting to group by an ID and find common pairings in a second column based on the shared ID. Most other questions involved finding any combination between multiple columns.

Here is some sample data:

/* Create sample data */ 
 DROP TABLE IF EXISTS example
 CREATE TABLE example (
    PersonID int,
    Place varchar(50)
 )

 INSERT INTO example (PersonID, Place)
 VALUES (1, 'home'), (2, 'work'), (3, 'gym'), (1, 'grocery'), (1, 'home'), (2, 'gym'), (3, 'work'), 
        (4, 'school'), (2, 'gym'), (3, 'gym'), (4, 'home'), (4, 'school'), (4, 'work'), (5, 'bar')

 SELECT * FROM example
 Order by PersonID asc

Whenever a PersonID has more than one row, I'd like to see the common pairings of Place in the following format (for a Sankey chart).

from      | to       | count
____________________________
gym       | gym      | 2
gym       | work     | 2
school    | school   | 1
home      | home     | 1
school    | work     | 1
grocery   | home     | 1 

A pairing can be for the same place, e.g. PersonID == 1 went 'home' twice, but I only need pairings of two in the from-to format.

So far I've tried the STRING_AGG function, but I'm having difficulty limiting it to just two-way pairings. Really appreciate the help and I apologize if this is a simple answer that's been solved before.

Attempt:

/* Next, let's try to make our Sankey data (from, to, count) */
DROP TABLE IF EXISTS temp_example
SELECT t.combination, COUNT(*) AS value
INTO temp_example
FROM (SELECT STRING_AGG(Place, ',') within group (order by Place) combination 
           FROM example
           GROUP BY PersonID
           HAVING COUNT(*) >= 2
     ) t
GROUP BY t.combination
ORDER BY value desc
1

There are 1 answers

1
MatBailie On BEST ANSWER

First, you need another column. One which can be used to identify which order the person visited the places. SQL tables are unordered, so the order you insert the data isn't enough. For example, add a timestamp column or something?

Then, use LAG() to find out the place that was previously visited for each row. After which it's a simple GROUP BY.

WITH
  lagged AS
(
  SELECT
    *, 
    LAG(place) OVER (PARTITION BY PersonID ORDER BY aTimestampOrSomething) AS prevPlace
  FROM
    example
)
SELECT
  prevPlace,
  place,
  COUNT(*)
FROM
  lagged

(Apologies for typos, etc, I'm on my phone)