MySQL - Getting results on a range by column value

359 views Asked by At

Solution B

As I accepted @Strawberry answer and this solution is out of the question of the post itself, I won't write it as an answer, but leave it here to anyone to whom it may be useful.

The main problem is a query performance and structure problem. The fact I am searching for all the attributes of all slides and after that limiting the results is making it go so slow. The solution would be to get first the slides I want to get and, then, searching all the extra information from that shorter set of results.

So, the final query (which runs under 0.2 seconds and gets all the results I wish) would be the following:

SELECT 
    sl.slide_id AS slide_id,        
    sl.time_in AS time_in,
    sl.time_out AS time_out, 
    sl.duration AS duration, 
    sl.slide_order AS slide_order,
    sl.title AS title,
    sl.slide_type AS slide_type, 
    sl.report_id AS report_id, 
    sltg.tag_category_id AS tag_category_id,
    sltg.tag_value_id AS tag_value_id,
    sltgc.txt AS tag_category_text,
    sltgv.txt AS tag_value_text,
FROM (SELECT 
        sl.time_in AS time_in,
        sl.time_out AS time_out, 
        sl.duration AS duration, 
        sl.slide_order AS slide_order,
        sl.media_id AS media_id,
        sl.title AS title,
        sl.slide_type AS slide_type, 
        slrep.report_id AS report_id, 
        slrep.slide_id AS slide_id
    FROM er_slides sl 
    INNER JOIN er_slides_in_report slrep 
        ON slrep.slide_id = sl.unique_id 
            AND slrep.report_id IN (1461317308472,1461597566425,1461598458236)  
            AND slrep.deleted_date IS NULL 
    LIMIT 0, 5
) sl  
INNER JOIN er_slides_tags sltg 
    ON sltg.deleted_date IS NULL 
        AND sltg.slide_id = sl.slide_id 
INNER JOIN er_slide_tags_categories sltgc
    ON sltgc.id = sltg.tag_category_id 
INNER JOIN er_slide_tags_values sltgv
    ON sltgv.id = sltg.tag_value_id 
ORDER BY slide_id, tag_value_id;

As you can see, I limit the set before searching for anything else. After that, the query runs so fast. I hope that this approach helps someone to improve their own queries, taking my original one as an example of what you must NOT do.

The original question

I have a table with some elements (slides) which are grouped on some containers (reports), each one of those elements with some attributes (tag values) and categories (tag categories) assigned to them. It would be like a tagging system, where an element X can have the tag Y from the category Z.

Database structure

CREATE TABLE IF NOT EXISTS `er_reports` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `unique_id` varchar(64) COLLATE utf8_unicode_ci NOT NULL DEFAULT '0',
  `name` varchar(256) COLLATE utf8_unicode_ci NOT NULL,
  `user_id` varchar(64) COLLATE utf8_unicode_ci NOT NULL,
  `author` varchar(32) COLLATE utf8_unicode_ci DEFAULT NULL,
  `report_status` varchar(32) COLLATE utf8_unicode_ci NOT NULL DEFAULT 'incomplete',
  `num_slides` int(4) NOT NULL DEFAULT '0',
  `report_type` varchar(25) COLLATE utf8_unicode_ci NOT NULL DEFAULT 'report',
  `target_id` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
  `created_date` datetime DEFAULT NULL,
  `uploaded_date` datetime DEFAULT NULL,
  `deleted_date` datetime DEFAULT NULL,
  `modified_date` datetime DEFAULT NULL,
  `item_reference` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
  `is_favourite` tinyint(4) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  UNIQUE KEY `unique_id` (`unique_id`),
  KEY `unique_id_2` (`unique_id`),
  KEY `unique_id_3` (`unique_id`),
  KEY `user_id` (`user_id`),
  KEY `deleted_date` (`deleted_date`),
  KEY `is_favourite` (`is_favourite`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=714 ;


CREATE TABLE IF NOT EXISTS `er_slides` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `unique_id` varchar(64) COLLATE utf8_unicode_ci NOT NULL DEFAULT '0',
  `report_id` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
  `action_id` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
  `media_id` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
  `thumbnail_id` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
  `media_offset` int(6) NOT NULL DEFAULT '0',
  `time_in` decimal(9,3) DEFAULT NULL,
  `time_out` decimal(9,3) DEFAULT NULL,
  `duration` decimal(9,3) NOT NULL DEFAULT '10.000',
  `title` text COLLATE utf8_unicode_ci,
  `slide_comment` text COLLATE utf8_unicode_ci,
  `note_id` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
  `content` text COLLATE utf8_unicode_ci,
  `media_object` text COLLATE utf8_unicode_ci,
  `slide_type` varchar(32) COLLATE utf8_unicode_ci NOT NULL DEFAULT 'action',
  `user_id` varchar(64) COLLATE utf8_unicode_ci NOT NULL,
  `slide_order` int(4) NOT NULL DEFAULT '0',
  `count_slide` tinyint(1) NOT NULL DEFAULT '1',
  `visible` tinyint(1) NOT NULL DEFAULT '1',
  `deleted_date` datetime DEFAULT NULL,
  `created_date` datetime DEFAULT NULL,
  `uploaded_date` datetime DEFAULT NULL,
  `modified_date` datetime DEFAULT NULL,
  `item_reference` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `unique_id` (`unique_id`),
  KEY `report_id` (`report_id`),
  KEY `deleted_date` (`deleted_date`),
  KEY `action_id` (`action_id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=31899 ;


CREATE TABLE IF NOT EXISTS `er_slides_in_report` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `report_id` varchar(64) COLLATE utf8_unicode_ci NOT NULL,
  `slide_id` varchar(64) COLLATE utf8_unicode_ci NOT NULL,
  `slide_order` int(3) NOT NULL DEFAULT '1',
  `added_date` datetime DEFAULT NULL,
  `modified_date` datetime DEFAULT NULL,
  `deleted_date` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `report_id` (`report_id`,`slide_id`,`deleted_date`),
  KEY `slide_order` (`slide_order`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=16843 ;


CREATE TABLE IF NOT EXISTS `er_slides_tags` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `slide_id` varchar(64) COLLATE utf8_unicode_ci NOT NULL,
  `tag_category_id` bigint(20) NOT NULL,
  `tag_value_id` bigint(20) NOT NULL,
  `created_date` datetime NOT NULL,
  `deleted_date` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `slide_id` (`slide_id`,`tag_category_id`,`tag_value_id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=220623 ;    

CREATE TABLE IF NOT EXISTS `er_slide_tags_categories` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `txt` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  PRIMARY KEY (`id`),
  KEY `txt` (`txt`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=17 ;


CREATE TABLE IF NOT EXISTS `er_slide_tags_values` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `txt` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  PRIMARY KEY (`id`),
  KEY `txt` (`txt`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=439 ;

What I've tried

SELECT @slide_num := if(@sl_id = slrep.slide_id collate utf8_unicode_ci, @slide_num, @slide_num := @slide_num+1) as slide_num,
    @sl_id := slrep.slide_id collate utf8_unicode_ci as sl_id,
    slrep.report_id AS report_id, 
    slrep.slide_id AS slide_id,
    sltg.tag_category_id AS tag_category_id,
    sltg.tag_value_id AS tag_value_id,
    sltgc.txt AS tag_category_text,
    sltgv.txt AS tag_value_text,
    sl.time_in AS time_in,
    sl.time_out AS time_out, 
    sl.duration AS duration, 
    sl.slide_order AS slide_order,
    sl.title AS title,
    sl.slide_type AS slide_type 
    FROM (SELECT @sl_id:=1, @slide_num := 0) v, er_slides sl
    INNER JOIN er_slides_in_report slrep 
        ON slrep.slide_id = sl.unique_id 
            AND slrep.report_id IN (1461317308472,1461597566425,1461598458236) 
            AND slrep.deleted_date IS NULL 
    INNER JOIN er_slides_tags sltg 
        ON sltg.deleted_date IS NULL 
            AND sltg.slide_id = sl.unique_id 
    INNER JOIN er_slide_tags_categories sltgc
        ON sltgc.id = sltg.tag_category_id 
    INNER JOIN er_slide_tags_values sltgv
        ON sltgv.id = sltg.tag_value_id 
    WHERE @slide_num >= 0 AND @slide_num <= 5
    ORDER BY slide_id, tag_value_id;

I've put some fake report IDs for you to see how the query is constructed.

The problem

The problem is that this is not enough fast - it takes almost 3 seconds to get the information for 5 slides, 200 rows - and that I cannot modify the from limit. If I write:

WHERE @slide_num >= 10 AND @slide_num <= 15

I get an empty result (and, of course, I have checked there are enough slides).

I cannot understand neither why it takes 3 seconds to get 200 rows.

What I need

I'd need to be able to query only those slides which are between the selected range, which is dynamic, in the fastest way possible.

If you find something is missing, comment what is it so I can post it.

Thank you.

EDIT: Explain Query (Strawberry approach)

As @strawberry suggested, I tried to apply his approach. However, the response time of the query is the same writing a range from BETWEEN 0 AND 5 than writing BETWEEN 0 AND 200 (around 17 seconds both).

As this may be because of a bad indexing, I've decided to write the EXPLAIN here, as I cannot see any bad indexing (every condition put in a WHERE clause has its index).

EXPLAIN result of @strawberry approach applied to my query

1

There are 1 answers

6
Strawberry On BEST ANSWER

Consider this simplified example...

DROP TABLE IF EXISTS slides;

CREATE TABLE slides 
(slide_id INT NOT NULL);

INSERT INTO slides VALUES
(1),
(2),
(4),
(5),
(6),
(7);

DROP TABLE IF EXISTS slides_tags;

CREATE TABLE slides_tags
(slide_id INT NOT NULL
,tag_id INT NOT NULL
,PRIMARY KEY(slide_id,tag_id)
);

INSERT INTO slides_tags VALUES
(1,101),
(1,103),
(1,105),
(1,107),
(2,102),
(2,104),
(2,106),
(2,108),
(4,105),
(4,110),
(5,101);

SELECT slide_id
     , tag_id 
     , i
  FROM 
     ( SELECT s.*
            , st.tag_id
            , CASE WHEN @prev = s.slide_id THEN @i:=@i ELSE @i:=@i+1 END i
            , @prev:=s.slide_id 
         FROM slides s 
         LEFT 
         JOIN slides_tags st 
           ON st.slide_id = s.slide_id 
         JOIN 
            ( SELECT @prev:=null,@i:=0) vars 
        ORDER 
           BY s.slide_id
     ) x 
 WHERE i BETWEEN 3 AND 5;
+----------+--------+------+
| slide_id | tag_id | i    |
+----------+--------+------+
|        4 |    105 |    3 |
|        4 |    110 |    3 |
|        5 |    101 |    4 |
|        6 |   NULL |    5 |
+----------+--------+------+

I've inlcuded the i column in the result for clarity. Of course, if it's not needed, omit it from the superquery.

EDIT:

It seems you can rewrite this query as follows but, to be honest, I'm baffled as to why this works:

SELECT s.slide_id 
     , st.tag_id
     , CASE WHEN @prev = s.slide_id THEN @i:=@i ELSE @i:=@i+1 END i
     , @prev:=s.slide_id 
  FROM (SELECT @i:=0, @prev := 0) vars
  JOIN slides s
  LEFT
  JOIN slides_tags st
    ON st.slide_id = s.slide_id
 HAVING i BETWEEN 3 AND 5
 ORDER 
    BY slide_id
     , tag_id;

+----------+--------+------+-------------------+
| slide_id | tag_id | i    | @prev:=s.slide_id |
+----------+--------+------+-------------------+
|        4 |    105 |    3 |                 4 |
|        4 |    110 |    3 |                 4 |
|        5 |    101 |    4 |                 5 |
|        6 |   NULL |    5 |                 6 |
+----------+--------+------+-------------------+