accelerate very large mysql join

502 views Asked by At

Problem:

  1. Query aborts after 10 minutes
  2. Query should be faster

I created the following query. This one was the fastest of several versions. Unfortunately, with more data, even this one abots after 600 seconds with an error "Error Code: 2013. Lost connection to MySQL server during query".

    CREATE OR REPLACE VIEW 1 AS
    SELECT `Timeperiod`    AS `Timeperiod`    ,
           "at"            AS `Domain`        ,
           `Content Groups`AS `Content Groups`,
    ...
    FROM   a

UNION ALL

SELECT `Timeperiod`    AS `Timeperiod`    ,
       "com"           AS `Domain`        ,
       `Content Groups`AS `Content Groups`,
       ...
FROM   b

UNION ALL

SELECT `Timeperiod`    AS `Timeperiod`    ,
       "com"                 AS `Domain`,
       `Content Groups`AS `Content Groups`,
       ...
FROM   c

UNION ALL

SELECT `Timeperiod`    AS `Timeperiod`    ,
       "fr"                 AS `Domain`,
       `Content Groups`AS `Content Groups`,
       ...
FROM   d

UNION ALL

SELECT `Timeperiod`    AS `Timeperiod`    ,
       "it"                 AS `Domain`,
       `Content Groups`AS `Content Groups`,
       ...
FROM   e;

CREATE OR REPLACE VIEW 2 AS
SELECT `Timeperiod`        AS `Timeperiod`       ,
       `Content Group`     AS `Content Group`    ,
       "at"                AS `Domain`,
       ...
FROM   f

UNION ALL

SELECT `Timeperiod`        AS `Timeperiod`       ,
       `Content Group`     AS `Content Group`    ,
       "com"               AS `Domain`,
       ...
FROM g

UNION ALL

SELECT `Timeperiod`        AS `Timeperiod`       ,
       `Content Group`     AS `Content Group`    ,
       "com"               AS `Domain`,
       ...
FROM h

UNION ALL

SELECT `Timeperiod`        AS `Timeperiod`       ,
       `Content Group`     AS `Content Group`    ,
       "fr"                AS `Domain`,
       ...
FROM   i

UNION ALL

SELECT `Timeperiod`        AS `Timeperiod`       ,
       `Content Group`     AS `Content Group`    ,
       "it"                AS `Domain`,
       ...    
FROM   j;

CREATE OR REPLACE VIEW 3 AS
SELECT CG.`Domain`                                        AS `Domain`             ,
       TP.`TimeperiodAlias`                               AS `Timeperiod`         ,
       CG.`Content Groups`                                AS `Content Group`      ,
       M.`InternalName`                                   AS `Internal Model Name`,
       ...
FROM   1 CG                      ,
       Timperiods TP             ,
       Models M
WHERE  CG.`Content Groups` LIKE CONCAT(M.`ContentGroupName`, '%')
AND    CG.`Timeperiod`        = TP.`Timeperiod`;

CREATE OR REPLACE VIEW 4 AS
SELECT CGD.`Domain`                                              AS `Domain`        ,
       TP.`TimeperiodAlias`                                      AS `Timeperiod`    ,
       CGD.`Content Group`                                       AS `Content Group`,
       ...
FROM   2 CGD,
       Timeperiods TP                 ,
       Models M 
WHERE  CGD.`Content Group` LIKE CONCAT(M.`ContentGroupName`, '%')
AND    CGD.`Timeperiod`       = TP.`Timeperiod`;

DROP TABLE IF EXISTS 5;

CREATE TABLE IF NOT EXISTS 5
             (
                          `Domain`     VARCHAR(3) NOT NULL ,
                          `Timeperiod` VARCHAR(30) NOT NULL,
                          `Content Group` varchar(70),
                          `Internal Model Name` VARCHAR(50),
                          ...
                           PRIMARY KEY (`Domain`,`Timeperiod`, `Content Group`)
             )    
AS        

SELECT CG.`Domain`              AS `Domain`             ,
       CG.`Timeperiod`          AS `Timeperiod`         ,
       CG.`Content Group`       AS `Content Group`      ,
       CG.`Internal Model Name` AS `Internal Model Name`,
       ...
FROM   3 CG,
       4 CGD
WHERE  CG.`Content Group` = CGD.`Content Group`
AND    CG.`Timeperiod`    = CGD.`Timeperiod`
AND    CG.`Domain`        = CGD.`Domain`;

These are the number of rows of the steps:

1: 64763 2: 51932

Timeperiods: 36

Models: 15

3: 2706

4: 2172

This is the EXPLAIN:

'1', 'PRIMARY', 'M', 'ALL', NULL, NULL, NULL, NULL, '15', ''
'1', 'PRIMARY', 'M', 'index', NULL, 'CGIndex', '242', NULL, '15', 'Using index; Using join buffer'
'1', 'PRIMARY', '<derived3>', 'ALL', NULL, NULL, NULL, NULL, '9528', 'Using where; Using join buffer'
'1', 'PRIMARY', 'TP', 'eq_ref', 'PRIMARY', 'PRIMARY', '65', 'CG.Timeperiod', '1', ''
'1', 'PRIMARY', '<derived9>', 'ALL', NULL, NULL, NULL, NULL, '21226', 'Using where; Using join buffer'
'1', 'PRIMARY', 'TP', 'eq_ref', 'PRIMARY', 'PRIMARY', '65', 'CGD.Timeperiod', '1', 'Using where'
'9', 'DERIVED', 'ContentGroupDuration_jMKL35_ALL', 'ALL', NULL, NULL, NULL, NULL, '17794', ''
'10', 'UNION', 'ContentGroupDurationVisitDuration_k4cZ5M_ALL', 'ALL', NULL, NULL, NULL, NULL, '1', ''
'11', 'UNION', 'ContentGroupDurationVisitDuration_k4cZ5M_ALL', 'ALL', NULL, NULL, NULL, NULL, '1', ''
'12', 'UNION', 'ContentGroupDuration_jMKL35_ALL', 'ALL', NULL, NULL, NULL, NULL, '1', ''
'13', 'UNION', 'ContentGroupDuration_jMKL35_ALL', 'ALL', NULL, NULL, NULL, NULL, '1', ''
NULL, 'UNION RESULT', '<union9,10,11,12,13>', 'ALL', NULL, NULL, NULL, NULL, NULL, ''
'3', 'DERIVED', 'ContentGroups_fd33ef1_ALL', 'ALL', NULL, NULL, NULL, NULL, '1', ''
'4', 'UNION', 'ContentGroups_fd33ef1_ALL', 'ALL', NULL, NULL, NULL, NULL, '1', ''
'5', 'UNION', 'ContentGroups_fd33ef1_ALL', 'ALL', NULL, NULL, NULL, NULL, '1', ''
'6', 'UNION', 'ContentGroups_fd33ef1_ALL', 'ALL', NULL, NULL, NULL, NULL, '10476', ''
'7', 'UNION', 'ContentGroups_fd33ef1_ALL', 'ALL', NULL, NULL, NULL, NULL, '1', ''
NULL, 'UNION RESULT', '<union3,4,5,6,7>', 'ALL', NULL, NULL, NULL, NULL, NULL, ''

Does anyone know a way how to fasten the query and/or how to avoid the connection abort?

Solution:

Problem 1: execute "set wait_timeout=2147483" from command line (not inside sql)

Problem 2: store intermediate results in temporary tables and add indexes. Then perform the large join.

Best

Christian

1

There are 1 answers

5
BuZz On BEST ANSWER

I'd say there's two ways : - either change the timeout on non interactive connections (wait_timeout for Mysql) - or optimize somehow your table structure

I've been working on large commercial databases in the past, the performance of you joins relates more to the way your tables are indexes than to the amount of lines fetched. Make sure the right tables have the right keys, try to augment those if possible.

Anyway, change the wait_timeout, a connection shouldnt die so soon if you allow for long and complex queries.

To change the timeout, log in as root to mysql : mysql -u root -p, input password, and enter the following : set global wait_timeout=2147483

thats the max value corresponding to 23 days on windows. It can get much higher on Linux distribs, but well you don't need that long anyway.

cheers,