Need validation of consistent result set with group by and having clause

27 views Asked by At

I have a table as below

CREATE TABLE `zpost` (
  `post_id` int(10) UNSIGNED NOT NULL,
  `topic_id` int(10) UNSIGNED NOT NULL DEFAULT 0,
  `post_subject` varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL DEFAULT ''
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

With data set as

INSERT INTO `zpost` (`post_id`, `topic_id`, `post_subject`) VALUES
(44, 33, 'New topic by new user'),
(45, 33, 'Re: New topic by new user'),
(47, 33, 'Re: New topic by new user'),
(46, 34, 'New Topic by James on 1/2'),
(48, 35, 'Sep 29th new topic'),
(49, 35, 'Re: Sep 29th new topic'),
(50, 35, 'Re: Sep 29th new topic'),
(51, 36, 'Another Sep topic');

And indexes (not relevant to the question, but here)

ALTER TABLE `zpost`
  ADD PRIMARY KEY (`post_id`),
  ADD KEY `topic_id` (`topic_id`);

And finally the SQL

SELECT * FROM `zpost` group by `topic_id` having min(`topic_id`);

Finally the output

|post_id|topic_id|post_subject              |
+-------+--------+--------------------------+
|     44|      33|New topic by new user     |
|     46|      34|New Topic by James on 1/2 |
|     48|      35|Sep 29th new topic        |
|     51|      36|Another Sep topic         |

I only want the minimum post_id for a given topic_id - the 1st topic record. And I seem to get that by default. Not sure if that is just the way the database decided to provide the rows or if this is a consistent order. The database in question is MariaDB. I have also tried to insert data in reverse order in the database as below

INSERT INTO `zpost` (`post_id`, `topic_id`, `post_subject`) VALUES
(51, 36, 'Another Sep topic'),
(50, 35, 'Re: Sep 29th new topic'),
(49, 35, 'Re: Sep 29th new topic'),
(48, 35, 'Sep 29th new topic'),
(46, 34, 'New Topic by James on 1/2'),
(47, 33, 'Re: New topic by new user'),
(45, 33, 'Re: New topic by new user'),
(44, 33, 'New topic by new user');

And I still get the results I want, which is great news and no further action needs to be taken. But not sure why and for the sake of completeness if I wanted the last row (max post_id) how would I alter the SQL to get that row associated with each topic_id? One would think that changing min to max would take care of that, but no! I get the same result for this query as well.

SELECT * FROM `zpost` group by `topic_id` having max(`topic_id`);
1

There are 1 answers

3
The Impaler On

First of all, in relational databases the rows in a table don't have any inherent ordering. It doesn't matter in which order you insert, update, or delete them. Tables represent unordered bags of rows.

You can use ROW_NUMBER() to identify the rows you want.

To get the older post_id for each topic_id you can do:

select post_id, topic_id, post_subject
from (
  select *, row_number() over(partition by topic_id order by post_id) as rn
  from zpost
) x
where rn = 1

Result:

 post_id  topic_id  post_subject              
 -------- --------- ------------------------- 
 44       33        New topic by new user     
 46       34        New Topic by James on 1/2 
 48       35        Sep 29th new topic        
 51       36        Another Sep topic         

See running example at DB Fiddle - ASC.

To get the newest post_id for each topic_id you can do:

select post_id, topic_id, post_subject
from (
  select *, row_number() over(partition by topic_id order by post_id desc) as rn
  from zpost
) x
where rn = 1

Result:

 post_id  topic_id  post_subject              
 -------- --------- ------------------------- 
 47       33        Re: New topic by new user 
 46       34        New Topic by James on 1/2 
 50       35        Re: Sep 29th new topic    
 51       36        Another Sep topic         

See running example at DB Fiddle - DESC.