I've got a recurring pattern in a system that I'm currently working on, where, for example, I need to select all the users that have orders under a list of possible companies. Or needing to select users if a record of this user being flagged exists.
My users
table contains 430,825 records, so this shouldn't really be that difficult to deal with. Right now I'm close, I have a query that get's that .047s execution time that I look for, but if I add one more piece to this, it gets real slow.
Here's my current query, the fast one:
select`UserID`
from`users`
where(`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
or`UserID`in(select*
from(select`UserID`
from`invoices`
where`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
and`__Active`=1)`a`)
or`UserID`in(select*
from(select`UserID`
from`quoterequests`
where`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
and`__Active`=1)`a`))
and(`UserID`in(select*
from(select`UserID`
from`userassociations`
where`_Email`='[email protected]'
and`__Active`=1)`a`))
and(`UserID`in(select*
from(select`UserID`
from`usercustomerflags`
where`CustomerFlagID`in(10,27,17,1,2,3,4,5,6)
and`__Active`=1)`a`)
or not exists(select 1
from`usercustomerflags`
where`__Active`=1
and`users`.`UserID`=`UserID`))
and`Deleted`=0
order by`DateTimeAdded`desc
limit 50;
(The extra select*from(...)
is because of this https://stackoverflow.com/a/1434712/728236)
In the middle there, I'm pulling users by further by email address, while checking other related tables for emails that might be related to this user. Like, the next piece searches users including their CC addresses when quotes are sent out to customers.
select`UserID`
from`users`
where(`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
or`UserID`in(select*
from(select`UserID`
from`invoices`
where`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
and`__Active`=1)`a`)
or`UserID`in(select*
from(select`UserID`
from`quoterequests`
where`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
and`__Active`=1)`a`))
and(`UserID`in(select*
from(select`UserID`
from`userassociations`
where`_Email`='[email protected]'
and`__Active`=1)`a`)
or`UserID`in(select*
from(select`UserID`
from`userquotesemails`
where`Email`='[email protected]'
and`__Active`=1)`a`))
and(`UserID`in(select*
from(select`UserID`
from`usercustomerflags`
where`CustomerFlagID`in(10,27,17,1,2,3,4,5,6)
and`__Active`=1)`a`)
or not exists(select 1
from`usercustomerflags`
where`__Active`=1
and`users`.`UserID`=`UserID`))
and`Deleted`=0
order by`DateTimeAdded`desc
limit 50;
I've added the alternate table to search for emails, but now the query takes 3.016s, which is way, way slower. It seems odd that, as I was building this query, that last part seemed to be the tipping point in performance here, what would be the cause of this?
The first and second explains, respectively
+----+--------------------+-------------------+--+----------------+---------------------------------------------------------------------------------------------+------------------------------+------+-----------------------+---+-------+---------------------------------+
| 1 | PRIMARY | <subquery6> | | ALL | | | | | | 0.00 | Using temporary; Using filesort |
| 1 | PRIMARY | users | | eq_ref | PRIMARY,UserID_UNIQUE,fk_users_1_idx,users_Customers | PRIMARY | 144 | <subquery6>.UserID | 1 | 50.00 | Using where |
| 6 | MATERIALIZED | userassociations | | ref | userassociations_UserID,userassociations__Email | userassociations__Email | 1026 | const | 3 | 10.00 | Using where |
| 10 | DEPENDENT SUBQUERY | usercustomerflags | | ref | usercustomerflags_UserID_idx | usercustomerflags_UserID_idx | 144 | sterling.users.UserID | 1 | 10.00 | Using where |
| 8 | DEPENDENT SUBQUERY | usercustomerflags | | index_subquery | usercustomerflags_CustomerFlagID_idx,usercustomerflags_UserID_idx | usercustomerflags_UserID_idx | 144 | func | 1 | 4.95 | Using where |
| 4 | DEPENDENT SUBQUERY | quoterequests | | index_subquery | quoterequests_CompanyID,quoterequests_UserID,quoterequests__Latest,quoterequests_UserQuotes | quoterequests__Latest | 145 | func | 2 | 5.00 | Using where |
| 2 | DEPENDENT SUBQUERY | invoices | | index_subquery | Invoice_UserID_idx,Invoice_CompanyID_idx,invoices_SampleRequests,invoices_LateOrdersBubble | Invoice_UserID_idx | 145 | func | 1 | 3.33 | Using where |
+----+--------------------+-------------------+--+----------------+---------------------------------------------------------------------------------------------+------------------------------+------+-----------------------+---+-------+---------------------------------+
+----+--------------------+-------------------+--+-----+---------------------------------------------------------------------------------------------+--------------------------------+------+-----------------------+--------+--------+-------------+
| 1 | PRIMARY | users | | ref | fk_users_1_idx,users_Customers | users_Customers | 4 | const | 227515 | 100.00 | Using where |
| 12 | DEPENDENT SUBQUERY | usercustomerflags | | ref | usercustomerflags_UserID_idx | usercustomerflags_UserID_idx | 144 | sterling.users.UserID | 1 | 10.00 | Using where |
| 10 | SUBQUERY | usercustomerflags | | ALL | usercustomerflags_CustomerFlagID_idx,usercustomerflags_UserID_idx | | | | 3509 | 4.94 | Using where |
| 8 | SUBQUERY | userquotesemails | | ref | userquotesemails_Email__Active,userquotesemails_UserID | userquotesemails_Email__Active | 1027 | const,const | 1 | 100.00 | |
| 6 | SUBQUERY | userassociations | | ref | userassociations_UserID,userassociations__Email | userassociations__Email | 1026 | const | 3 | 10.00 | Using where |
| 4 | SUBQUERY | quoterequests | | ref | quoterequests_CompanyID,quoterequests_UserID,quoterequests__Latest,quoterequests_UserQuotes | quoterequests_CompanyID | 144 | const | 16702 | 10.00 | Using where |
| 2 | SUBQUERY | invoices | | ref | Invoice_UserID_idx,Invoice_CompanyID_idx,invoices_SampleRequests,invoices_LateOrdersBubble | Invoice_CompanyID_idx | 144 | const | 17678 | 10.00 | Using where |
+----+--------------------+-------------------+--+-----+---------------------------------------------------------------------------------------------+--------------------------------+------+-----------------------+--------+--------+-------------+
Also, I've tried using joins, e.g. joining the invoices
table, etc. but then I get the issue of having duplicate user rows per each invoice
or quoterequest
the joins receive, and grouping/distinct & ordering the resulting data became immensely slow, in the minutes.
I've also tried just the "exists" version of the first query, as suggesting by the doc https://dev.mysql.com/doc/refman/5.7/en/subquery-optimization-with-exists.html like so
select`UserID`
from`users`
where(`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
or exists(select 1
from`invoices`
where`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
and`__Active`=1
and`users`.`UserID`=`UserID`)
or exists(select 1
from`quoterequests`
where`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
and`__Active`=1
and`users`.`UserID`=`UserID`))
and(exists(select 1
from`userassociations`
where`_Email`='[email protected]'
and`__Active`=1
and`users`.`UserID`=`UserID`))
and(exists(select 1
from`usercustomerflags`
where`CustomerFlagID`in(10,27,17,1,2,3,4,5,6)
and`__Active`=1
and`users`.`UserID`=`UserID`)
or not exists(select 1
from`usercustomerflags`
where`__Active`=1
and`users`.`UserID`=`UserID`))
and`Deleted`=0
order by`DateTimeAdded`desc
limit 50;
But that gets me up to 5.516s, so that definitely doesn't seem like the right direction to make.
What is the most efficient way to select data the way that I'm trying? Or do I need to restructure some of my tables to get the performance I'm looking for?
I've isolated the smallest sub-problem and bottle neck I think I have. Here's my lighter query
select`users`.`UserID`,`users`.`_Customer`
from`users`
left join`userassociations`on`userassociations`.`UserID`=`users`.`UserID`
and`userassociations`.`__Active`=1
where(`users`.`Email`='[email protected]'
or`userassociations`.`_Email`='[email protected]')
and`users`.`Deleted`=0
order by`users`.`DateTimeAdded`desc
limit 50;
And the explain
+---+--------+------------------+--+-----+--------------------------------------------------------+-------------------------+-----+-----------------------+--------+--------+-------------+
| 1 | SIMPLE | users | | ref | users_getemail_INDEX,unify_email_INDEX,users_Customers | users_Customers | 4 | const | 221463 | 100.00 | Using where |
| 1 | SIMPLE | userassociations | | ref | userassociations_UserID | userassociations_UserID | 144 | sterling.users.UserID | 1 | 100.00 | Using where |
+---+--------+------------------+--+-----+--------------------------------------------------------+-------------------------+-----+-----------------------+--------+--------+-------------+
This query takes about 1.5 seconds to execute
CREATE TABLE `users` (
`UserID` char(36) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
...
`Email` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
...
`DateTimeAdded` datetime DEFAULT NULL,
...
`Deleted` int(1) NOT NULL DEFAULT '0',
...
`_LatestInvoiceDateTimeAdded` datetime DEFAULT NULL,
`_InvoiceCount` int(11) NOT NULL DEFAULT '0',
`_Customer` varchar(512) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
...
PRIMARY KEY (`UserID`),
UNIQUE KEY `UserID_UNIQUE` (`UserID`),
...
KEY `users_getemail_INDEX` (`Email`(191),`_InvoiceCount`,`_LatestInvoiceDateTimeAdded`,`DateTimeAdded`),
KEY `unify_email_INDEX` (`Email`(191),`UserID`),
...
KEY `users_Customers` (`Deleted`,`DateTimeAdded`),
...
KEY `users_DateTimeAdded` (`DateTimeAdded`,`UserID`),
FULLTEXT KEY `users_FULLTEXT__Customer` (`_Customer`),
...
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
CREATE TABLE `userassociations` (
`UserAssociationID` binary(16) NOT NULL,
`UserID` char(36) COLLATE utf8mb4_unicode_ci NOT NULL,
`AssociatedUserID` char(36) COLLATE utf8mb4_unicode_ci NOT NULL,
`_Email` varchar(256) COLLATE utf8mb4_unicode_ci NOT NULL,
`__UserID` char(36) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`__Active` tinyint(1) NOT NULL DEFAULT '1',
`__Added` timestamp(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6),
`__Updated` timestamp(6) NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP(6),
PRIMARY KEY (`UserAssociationID`),
KEY `userassociations_UserID` (`UserID`),
KEY `userassociations_AssociatedUserID` (`AssociatedUserID`),
KEY `userassociations___UserID` (`__UserID`),
KEY `userassociations__Email` (`_Email`),
CONSTRAINT `userassociations_AssociatedUserID` FOREIGN KEY (`AssociatedUserID`) REFERENCES `users` (`UserID`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `userassociations_UserID` FOREIGN KEY (`UserID`) REFERENCES `users` (`UserID`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `userassociations___UserID` FOREIGN KEY (`__UserID`) REFERENCES `users` (`UserID`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
Hmm... So it seemed like it was working, but I've found a pair of tables that it doesn't seem to be as efficient with, which is my users
and invoices
tables.
I've have these indexes:
users: INDEX(`CompanyID`, `Deleted`, `DateTimeAdded`)
invoices: INDEX(`UserID`, `__Active`)
invoices: INDEX(`CompanyID`)
users: INDEX(`UserID`, `Deleted`)
and the query
select`users`.`UserID`,`users`.`DateTimeAdded`
from`users`
join`invoices`on`invoices`.`UserID`=`users`.`UserID`
and`invoices`.`__Active`=1
where`invoices`.`CompanyID`='3e55c8b4-d8b6-11e4-b38f-b8ca3a83b4c8'
and`users`.`Deleted`=0
order by`DateTimeAdded`desc
limit 200;
This query alone takes .3 seconds, which feels slow to me, like it isn't making the best use of the indexes, especially because users
only has 430,997 rows and invoices
only has 194,180, and this looks like it should be a pretty simple query.
EDIT: Actually it's much worse than that, if the CompanyID given only includes ~4 rows, then this query takes 3.5 seconds
+---+--------+----------+--+-----+------------------------------------------------+-----------------------+-----+--------------------------------+------+--------+----------------------------------------------+
| 1 | SIMPLE | invoices | | ref | Invoice_CompanyID_idx,invoices_UserID___Active | Invoice_CompanyID_idx | 144 | const | 7750 | 10.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | users | | ref | users_UserID_Deleted | users_UserID_Deleted | 148 | sterling.invoices.UserID,const | 1 | 100.00 | |
+---+--------+----------+--+-----+------------------------------------------------+-----------------------+-----+--------------------------------+------+--------+----------------------------------------------+
For that smaller problem:
These will be needed:
(Let me know if you get syntax errors. If it is too slow, please provide
EXPLAIN
.)Index prefixing (
Email(191)
) is usually useless. Get rid if it. Here are 5 ways to avoid it: http://mysql.rjweb.org/doc.php/limits#767_limit_in_innodb_indexesA PK is a UNIQUE key, so get rid of the second of:
Smells like a UUID; use ascii (ascii_general_ci), not utf8mb4:
INT(1)
takes 4 bytes; useTINYINT
for flags.