How can I make grouping optimize?

78 views Asked by At

Here is my query:

EXPLAIN SELECT Count(1), 
       user_id, 
       type 
FROM   (SELECT e.user_id, 
               e.type, 
               Max(r.date_time) last_seen, 
               e.date_time      event_time 
        FROM   events e 
               JOIN requests r 
                 ON e.user_id = r.user_id 
                    AND e.type IN( 3, 5, 6 ) 
        GROUP  BY e.user_id, 
                  e.date_time, 
                  e.type 
        HAVING last_seen < event_time) x 
GROUP  BY user_id, 
          type

Also here is the result of EXPLAIN :

enter image description here

Also here is the result of that subquery (x) EXPLAIN:

enter image description here

See? Much optimal. So the issue is grouping here. Any idea how can I make that query better?


EDIT: We need two tables:

  1. requests table -- A new row will inserted inside it for each users request. So, the last (biggest) determines user's last time been online in our website somewhat.

  2. events table -- A new row will be inserted inside it for each answer, comment.

We're talking about a Q/A website. All we're trying to do is "sending an email to the users who got a new comment/answer after their last time being online in our website".

4

There are 4 answers

5
Gordon Linoff On BEST ANSWER

I would rewrite the query like this:

select user_id, type, count(*)
from (select e.user_id, e.type, e.date_time, 
             (select max(r.date_time)
              from requests r
              where r.user_id = e.user_id
              ) as last_seen 
       from events e 
       where e.type  in ( 3, 5, 6 ) 
      ) er
where last_seen < date_time
group by user_id, type;

Then, I want to be sure that there are indexes on requests(user_id, date_time) and events(type, user_id, date_time).

11
DRapp On

you need proper index on your table to match both the WHERE clause and Order by to help optimize.

table      index on...
events     ( type, user_id, date_time )
requests   ( user_id, date_time ) 

I might even suggest slight adjust of query.
Change your

AND e.type IN( 3, 5, 6 ) 

to

WHERE e.type IN( 3, 5, 6 ) 

Because the "e.Type" is based on your primary table of the query and has nothing to do with the actual JOIN to the requests table. The join should represent actual columns to qualify between tables.

SUGGESTION post edit to question. I might offer an alternate option. Add a column to your user table for "lastRequest" date/time field. Then, whenever a request is entered for that user, update the field in the user table. You don't need to keep subquery max() to find out when. This might simplify your query to something like... As your request table gets larger, so too will your query time. By looking directly at the user table ONCE for the already known latest request, you have your answer. Query 10k users, or 2mil requests... your choice to plow through :)

select 
      u.user_id,
      e.type,
      count(*) CountPerType,
      min( e.date_time ) firstEventDateAfterUsersLastRequest
   from
      user u
         join events e 
            on u.user_id = e.user_id
           AND e.type in ( 3, 5, 6 )
           AND e.date_time > u.lastRequest
   group by
      u.user_id,
      e.type

So your join already has an underlying date/time per user and you can just look for those records coming in AFTER the person last requested something (hence follow-up).

Then, to prepare the new column in your user table you could just update with the max( request.date_time ) per user.

If a person is active as of ex: Nov 27th, and there are 5 responses to 3 different event types AFTER that, you still get that person per their Nov 27th date, yet other people could have newer or older "latestRequest" dates.

Just an optional thought..

3
Alex On

http://sqlfiddle.com/#!9/c73878/1

ALTER TABLE `events` ADD INDEX e_type (type);
ALTER TABLE `events` ADD INDEX user_time (user_id, date_time);
ALTER TABLE requests ADD INDEX user_time (user_id, date_time);

SELECT  COUNT(*),
        e.user_id, 
        e.type
FROM `events` e 
JOIN  (
  SELECT user_id, Max(r.date_time) last_seen
  FROM requests r 
  GROUP BY user_id
) r
ON e.user_id = r.user_id 
   AND e.date_time > r.last_seen
WHERE e.type IN( 3, 5, 6 ) 
GROUP  BY e.user_id,  
       e.type 
0
Rick James On

See if this gets the 'right' answers:

SELECT  COUNT(DISTINCT(e.date_time),
        e.user_id, e.type
    FROM  events e
    JOIN  requests r  ON  e.user_id = r.user_id
                     AND  e.type IN( 3, 5, 6 )
    GROUP BY  e.user_id, e.type
    HAVING  MAX(r.date_time) < e.event_time

Indexes:

e:  INDEX(type)   -- may be useful (depends on cardinality)
r:  INDEX(user_id, date_time)  -- in this order