Is it possible in SQL to generate all pairs of participants from a list of events and participants? If so, how?

39 views Asked by At

The output of a system is a list of events and the identifiers of the participants of each event. That output is in two columns of an Excel worksheet that I've imported into a MySQL table. The first column is the event identifier, repeated N times where N is the number of participants who went to that event. The second column is the unique identifier of a participant who went to an event. There are an arbitrary number of events and an arbitrary number of participants at each event.

From this dataset I'd like to use SQL to generate the list of pairwise participants (i.e., dyads) of all participants and the event(s) that brought them together. The output would be in three columns where the first column was an event identifier, the second column was the identifier of one participant at that event, and the third column was the identifier of another participant at that event. There should be no duplicate "reverse" dyads (e.g. "1, A, B" AND "1, B, A") and no self-pairs (e.g., "1, A, A").

The input looks like this:

EventID PersonID
1 A
1 B
1 E
2 A
2 C
2 E
2 F

and the output of the SQL code would look like this:

EventID Person1ID Person2ID
1 A B
1 A E
1 B E
2 A C
2 A E
2 A F
2 C E
2 C F
2 E F

Is this even possible in SQL? If so, what would the code look like? (This is beyond my SQL abilities as it seems like it would require some procedural language-type actions to do counting and/or looping.)

Thank you for helping on this problem!

I don't even know how to start creating the SQL code to complete the processing I need. I suspect it requires setting a counter to reset for every "next" event before calculating the dyad pairs, but I have no clue about how to do the combinatoric generation of each dyad pair of participants when the number of participants is different for each event, much less how to generate the output in three columns.

1

There are 1 answers

1
marcothesane On

Join the in data table with itself in two roles: small (I use wee) and big; on equality of the event id and the wee.personid being less than the big.personid.

WITH
-- your input ...
indata(EventID,PersonID) AS (
          SELECT 1,'A'
UNION ALL SELECT 1,'B'
UNION ALL SELECT 1,'E'
UNION ALL SELECT 2,'A'
UNION ALL SELECT 2,'C'
UNION ALL SELECT 2,'E'
UNION ALL SELECT 2,'F'
)
-- end of your input, query starts here ...
SELECT
  wee.eventid
, wee.personid AS person1id
, big.personid AS person2id
FROM indata wee
JOIN indata big
  ON wee.eventid =big.eventid
 AND wee.personid<big.personid
ORDER BY 1,2,3
;
eventid person1id person2id
1 A B
1 A E
1 B E
2 A C
2 A E
2 A F
2 C E
2 C F
2 E F