SQL: How do I group by a unique combination of two columns?

2.2k views Asked by At

Context:

  • A table message has the columns from_user_id and to_user_id
  • The user should see the recent conversations with the last message displayed
  • A conversation consists of multiple messages, that have the same combination of user IDs (user sends messages, user receives messages)

Table content:

+-------------------------------------------------+--------------+------------+
| text                                            | from_user_id | to_user_id |
+-------------------------------------------------+--------------+------------+
| Hi there!                                       |           13 |         14 | <- Liara to Penelope
| Oh hi, how are you?                             |           14 |         13 | <- Penelope to Liara
| Fine, thanks for asking. How are you?           |           13 |         14 | <- Liara to Penelope
| Could not be better! How are things over there? |           14 |         13 | <- Penelope to Liara
| Hi, I just spoke to Penelope!                   |           13 |         15 | <- Liara to Zara
| Oh you did? How is she?                         |           15 |         13 | <- Zara to Liara
| Liara told me you guys texted, how are things?  |           15 |         14 | <- Zara to Penelope
| Fine, she's good, too                           |           14 |         15 | <- Penelope to Zara
+-------------------------------------------------+--------------+------------+

My attempt was to group by from_user_id and to_user_id, but I obviously get a group of the messages received by the user and another group of messages send by the user.

SELECT text, from_user_id, to_user_id,created FROM message 
WHERE from_user_id=13 or to_user_id=13
GROUP BY from_user_id, to_user_id
ORDER BY created DESC

Gets me:

+-------------------------------+--------------+------------+---------------------+
| text                          | from_user_id | to_user_id | created             |
+-------------------------------+--------------+------------+---------------------+
| Oh you did? How is she?       |           15 |         13 | 2017-09-01 21:45:14 | <- received by Liara
| Hi, I just spoke to Penelope! |           13 |         15 | 2017-09-01 21:44:51 | <- send by Liara
| Oh hi, how are you?           |           14 |         13 | 2017-09-01 17:06:53 |
| Hi there!                     |           13 |         14 | 2017-09-01 17:06:29 |
+-------------------------------+--------------+------------+---------------------+

Although I want:

+-------------------------------+--------------+------------+---------------------+
| text                          | from_user_id | to_user_id | created             |
+-------------------------------+--------------+------------+---------------------+
| Oh you did? How is she?       |           15 |         13 | 2017-09-01 21:45:14 | <- Last message of conversation with Zara
| Oh hi, how are you?           |           14 |         13 | 2017-09-01 17:06:53 |
+-------------------------------+--------------+------------+---------------------+

How can I achieve that?

EDIT: Using least or greatest does not lead to the required results either. It does group the entries correctly, but as you can see in the result, the last message is incorrect.

+----+-------------------------------------------------+------+---------------------+--------------+------------+
| id | text                                            | read | created             | from_user_id | to_user_id |
+----+-------------------------------------------------+------+---------------------+--------------+------------+
|  8 | Oh you did? How is she?                         | No   | 2017-09-01 21:45:14 |           15 |         13 |
|  5 | Could not be better! How are things over there? | No   | 2017-09-01 17:07:47 |           14 |         13 |
+----+-------------------------------------------------+------+---------------------+--------------+------------+
3

There are 3 answers

2
Gordon Linoff On BEST ANSWER

One method of doing what you want uses a correlated subquery, to find the minimum created date/time for a matching conversation:

SELECT m.*
FROM message m
WHERE 13 in (from_user_id, to_user_id) AND
      m.created = (SELECT MAX(m2.created)
                   FROM message m2
                   WHERE (m2.from_user_id = m.from_user_id AND m2.to_user_id = m.to_user_id) OR
                         (m2.from_user_id = m.to_user_id AND m2.to_user_id = m.from_user_id) 
                  )
ORDER BY m.created DESC
0
Juan Carlos Oropeza On

I use GREATEST and LEAST to create a grp for each conversation. Then sort for that grp and assign a row number based on the time.

SQL DEMO

SELECT *
FROM (
        SELECT LEAST(`from_user_id`, `to_user_id`) as L,
               GREATEST(`from_user_id`, `to_user_id`) as G,
               `text`,
               CONCAT (LEAST(`from_user_id`, `to_user_id`), '-', GREATEST(`from_user_id`, `to_user_id`)) as grp,
               @rn := if(@grp = CONCAT(LEAST(`from_user_id`, `to_user_id`), '-', GREATEST(`from_user_id`, `to_user_id`)),
                         @rn + 1,
                         if(@grp := CONCAT(LEAST(`from_user_id`, `to_user_id`), '-', GREATEST(`from_user_id`, `to_user_id`)), 1, 1)
                         ) as rn,
               `time`
        FROM Table1
        CROSS JOIN (SELECT @rn := 0, @grp := '') as var
        ORDER BY LEAST(`from_user_id`, `to_user_id`),
                 GREATEST(`from_user_id`, `to_user_id`),
                 `time` DESC
     ) T
WHERE rn = 1;

OUTPUT enter image description here

EDIT: at the end you need to filter the 13 from the conversation.

WHERE rn = 1
  AND 13 IN (`L`, `G`);
2
Thorsten Kettner On

The last conversations with #13? In a more up-to-date DBMS you'd use row_number() to find these. In MySQL you can use not exists, to make sure that there is no later post for the conversation partners. You find the partner's number easily with from_user_id + to_user_id - 13 by the way. (And when comparing two records, you can just use from_user_id + to_user_id.)

select text, from_user_id, to_user_id, created
from message m1
where 13 in (from_user_id, to_user_id)
and not exists
(
  select *
  from message m2
  where 13 in (m2.from_user_id, m2.to_user_id)
  and m2.from_user_id + m2.to_user_id = m1.from_user_id + m1.to_user_id
  and m2.created > m1.created
);