Mysql query still to slow for 100 million records in 10gb database size with index

1k views Asked by At

I have a pretty huge data set of products and users and their time of usage.

There are around 100 million rows and takes around 10 GB of disk space.

the dataset is on the following order:

userid     itemid      purchase_date    
1             1          2018-12-22
11            1          2018-12-22
11            4          2018-12-22
12            4          2018-12-22
11            5          2018-12-22

.......100M+ rows.....

I also added such index,

ALTER TABLE purchase_data ADD INDEX (userid);
ALTER TABLE purchase_data ADD INDEX (itemid);
ALTER TABLE purchase_data ADD INDEX (purchase_date);

Let say I want to find all the users who purchased a product(item 1) and then find all the other item he purchased.

Select itemid from purchase_data
    where userid in (Select userid, from purchase_data
                    where itemid=1)
      and itemid!=1

this query takes forever to run.

Secondly, I also want to add up all the common items between those users like userid 11 4 and userid 12 also brought 4 so I want to add 4 with count 2

I wrote a similar query for that:

Select itemid,count(*) from purchase_data
    where userid in (Select userid, from purchase_data
                      where itemid=1)
      and itemid!=1
    group by itemid
    having count(itemid)>=1

This script also takes infinite time.

Please help,

Thank You

1

There are 1 answers

1
ScaisEdge On

Instead of IN clause you should use a inner join eg:

Select itemid 
from purchase_data  a 
INNER JOIN  (
    Select userid
     from purchase_data where itemid=1
    ) T on t.userid = a,userid 
  where a.itemid != 1 

a IN clause work as several OR condition while the inner join work as a single relation ..

And Instead of several index with one column you should delete these index and create an composite index with the column involved in join condition on left and others column on the right

create index my_idx on  purchase_data(userid, itemid );

the same for the grouped query

Select itemid , count(*)
from purchase_data  a 
INNER JOIN  (
    Select userid
     from purchase_data where itemid=1
    ) T on t.userid = a,userid 
  where itemid != 1 
group by itemid 
having count(itemid)>=1