MySQL query slow when grouping by PM parent id (vBulletin database)

120 views Asked by At

I want to fetch all PMs from vBulletin as conversations like IM does, which should be used in my .NET Core library using Dapper. This means: A send message to B, B replys would be one conversation with two messages. Since this causes performance issues, I tried to figure it out using DBeaver by running the Dapper queries directly.

To fetch the conversations of a page from the inbox, I wrote the following query:

SELECT pm.pmid
FROM pm, pmtext AS txt
WHERE pm.pmtextid = txt.pmtextid 
AND (pm.userid = 123 OR txt.fromuserid = 123)
AND pm.folderid != -1
GROUP BY IF(pm.parentpmid != 0, pm.parentpmid, pm.pmid)
LIMIT 0, 50

This gave me the first 50 conversation ids for the user #123. It works, but took ~440ms to execute. I tried adding indices to all relevant fields

ALTER TABLE pmtext ADD INDEX fromuserid_only(fromuserid);
ALTER TABLE pm ADD INDEX userid_only(userid);
ALTER TABLE pm ADD INDEX parentpmid(parentpmid);

but its still slow. It seems caused by the GROUP BY. Even when I just do GROUP BY pm.parentpmid (which would produce wrong data, but just for performance testing), the query run time is not better. When I remove the GROUP BY, it's pretty fast (~12ms).

My query that counts the total pages of conversations is similar without the join and its fast (< 20ms):

// DbConnection db = ...
string sqlTotalPages = @"
    SELECT CEIL(COUNT(*)/ 50) AS pages
   FROM pm, pmtext AS txt
    WHERE pm.pmtextid = txt.pmtextid 
    AND (pm.userid = 18 OR txt.fromuserid = 18)";
int totalPages = db.QueryFirstOrDefault<int>(sqlTotalPages);

Why does GROUP BY slow down the query so massively? How could I improve the performance?

Table structure from vB

CREATE TABLE `pm` (
  `pmid` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `pmtextid` int(10) unsigned NOT NULL DEFAULT '0',
  `userid` int(10) unsigned NOT NULL DEFAULT '0',
  `folderid` smallint(6) NOT NULL DEFAULT '0',
  `messageread` smallint(5) unsigned NOT NULL DEFAULT '0',
  `parentpmid` int(10) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`pmid`),
  KEY `pmtextid` (`pmtextid`),
  KEY `userid` (`userid`,`folderid`),
  KEY `userid_only` (`userid`),
  KEY `parentpmid` (`parentpmid`)
) ENGINE=MyISAM AUTO_INCREMENT=221965 DEFAULT CHARSET=latin1

CREATE TABLE `pmtext` (
  `pmtextid` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `fromuserid` int(10) unsigned NOT NULL DEFAULT '0',
  `fromusername` varchar(100) NOT NULL DEFAULT '',
  `title` varchar(250) NOT NULL DEFAULT '',
  `message` mediumtext,
  `touserarray` mediumtext,
  `iconid` smallint(5) unsigned NOT NULL DEFAULT '0',
  `dateline` int(10) unsigned NOT NULL DEFAULT '0',
  `showsignature` smallint(5) unsigned NOT NULL DEFAULT '0',
  `allowsmilie` smallint(5) unsigned NOT NULL DEFAULT '1',
  `reportthreadid` int(10) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`pmtextid`),
  KEY `fromuserid` (`fromuserid`,`dateline`),
  KEY `fromuserid_only` (`fromuserid`),
  KEY `fromuserid_only2` (`fromuserid`)
) ENGINE=MyISAM AUTO_INCREMENT=118470 DEFAULT CHARSET=latin1
2

There are 2 answers

0
Erik H On

I think the reason why GROUP BY is causing such as increase in processing time is because of the LIMIT. When there's no GROUP BY the DB engine can stop processing rows in the query once it has found 50 that matches your criteria. With the GROUP BY clause though the entire table needs to be processed, grouped together and then the 50 first results will be returned. As for solution, would you get the correct result if you removed the GROUP BY and added "AND pm.parentpmid = 0" to the WHERE-clause? It seems the GROUP BY clause is there to remove rows with a parent from the result which is more efficiently done using WHERE (assuming all rows with a parent also has the parent present among the results).

0
Gert-Jan M On

In order to optimize your query, I need to know what you like to achieve with the group by clause. Could you give a small example filling of the tables with your expected outcome?

If you only want to show the parent mails then I agree with Erik H that it is better to use the following query:

SELECT pm.pmid
FROM pm, pmtext AS txt
WHERE pm.pmtextid = txt.pmtextid 
AND (pm.userid = 123 OR txt.fromuserid = 123)
AND pm.folderid != -1
AND pm.parentpmid = 0
LIMIT 0, 50;

but that gives a different result than your query does.

The effect of your GROUP BY looks quite arbitrary to me. Since the pmid is not part of an aggregate function and it is not grouped on, MySQL/mariaDB will return the first value that applies for the same grouping.

When I add the following values to your database:

INSERT INTO pmtext (`fromuserid`, `fromusername`,`title`,`message`,`touserarray`,`iconid`,`dateline`,`showsignature`,`allowsmilie`,`reportthreadid`)
VALUES 
    (123, 'Pete',  'Titlel',            'Hello1', '', 0, 0, 0, 1, 0),
    (123, 'Pete',  'Title2',            'Hello2', '', 0, 0, 0, 1, 0),
    (2,   'Hank',  'Re: Title1',        'Hello3', '', 0, 0, 0, 1, 0),
    (2,   'Hank',  'Re: Title2',        'Hello4', '', 0, 0, 0, 1, 0),
    (3,   'Chris', 'Re: Title2(a)',     'Hello5', '', 0, 0, 0, 1, 0),
    (2,   'Hank',  'Re: Re: Title2(a)', 'Hello6', '', 0, 0, 0, 1, 0),
    (123, 'Pete',  'Title3',            'Hello7', '', 0, 0, 0, 1, 0),
    (123, 'Pete',  'Re: Re: Title1',    'Hello8', '', 0, 0, 0, 1, 0),
    (123, 'Pete',  'Title4',            'Hello9', '', 0, 0, 0, 1, 0);

INSERT INTO pm ( `pmtextid`, `userid`, `folderid`, `messageread`, `parentpmid`)
VALUES
  (118470 , 123, 0, 0, 0),
  (118471 , 123, 0, 0, 0), 
  (118472 , 123, 0, 0, 221965),
  (118473 , 123, 0, 0, 221966), 
  (118474 , 123, 0, 0, 221966),
  (118475 , 123, 0, 0, 221969), 
  (118476 , 123, 0, 0, 0),
  (118477 , 123, 0, 0, 221967),
  (118478 , 123, 0, 0, 0);

Then your query will return:

  • 221965, That's "Title1"
  • 221966, That's "Title2"
  • 221972, That's "Re: Re: Title1"
  • 221970, That's "Re: Re: Title2(a)"
  • 221971, That's "Title3"
  • 221973, That's "Title4

Before I try to optimize your query, I need to know if this is the outcome that you expect.