I am totally new to MySQL and appreciate very much for your patience. I have the following MySQL table:
fact_id | from_id | relation_id | to_id | link_time |
---|---|---|---|---|
1 | 19 | 6 | 151 | 1 |
2 | 2233 | 2 | 57 | 1 |
3 | 182 | 23 | 112 | 1 |
4 | 22 | 17 | 21 | 1 |
5 | 3 | 8 | 742 | 1 |
6 | 507 | 2 | 55 | 1 |
7 | 154 | 25 | 56 | 1 |
8 | 100 | 83 | 18 | 1 |
9 | 1110 | 2 | 31 | 1 |
10 | 141 | 29 | 7 | 1 |
... | ... | ... | ... | ... |
with the corresponding init code:
create table icews14s
(
fact_id int auto_increment
primary key,
from_id int not null,
relation_id int not null,
to_id int not null,
link_time int not null
);
create index icews14s_from_id_index
on icews14s (from_id);
create index icews14s_link_time_index
on icews14s (link_time);
create index icews14s_to_id_index
on icews14s (to_id);
and one long query:
with target as (select fact_id, from_id, link_time from icews14s where fact_id = 69298),
pre_nodes_1 as (select o.fact_id as fact_id,
o.from_id as from_id,
o.relation_id as relation_id,
o.to_id as to_id,
o.link_time as link_time,
1 as degree
from icews14s o
left join (select * from icews14s) t
on o.from_id = t.from_id and o.link_time < t.link_time
where t.fact_id in (select fact_id from target)),
pre_nodes_2 as (select o.fact_id as fact_id,
o.from_id as from_id,
o.relation_id as relation_id,
o.to_id as to_id,
o.link_time as link_time,
2 as degree
from icews14s o
left join (select * from icews14s) t on o.from_id = t.to_id and o.link_time < t.link_time
where t.fact_id in (select fact_id from pre_nodes_1)),
pre_nodes_3 as (select o.fact_id as fact_id,
o.from_id as from_id,
o.relation_id as relation_id,
o.to_id as to_id,
o.link_time as link_time,
3 as degree
from icews14s o
left join (select * from icews14s) t on o.from_id = t.to_id and o.link_time < t.link_time
where t.fact_id in (select fact_id from pre_nodes_2))
select fact_id, from_id, relation_id, to_id, link_time, 0 as degree from icews14s where fact_id = 69298
union select * from pre_nodes_1
union select * from pre_nodes_2
union select * from pre_nodes_3
order by fact_id desc limit 30;
the corresponding result is :
fact_id | from_id | relation_id | to_id | link_time | degree |
---|---|---|---|---|---|
69298 | 1659 | 16 | 269 | 285 | 0 |
60977 | 1659 | 37 | 3176 | 253 | 1 |
58981 | 3176 | 1 | 3281 | 245 | 2 |
58757 | 1659 | 8 | 1884 | 245 | 1 |
58722 | 1659 | 0 | 1884 | 245 | 1 |
39282 | 1659 | 1 | 105 | 163 | 1 |
38740 | 105 | 7 | 1143 | 161 | 2 |
38570 | 105 | 29 | 1815 | 161 | 2 |
38440 | 105 | 19 | 2 | 160 | 2 |
38101 | 2 | 28 | 52 | 159 | 3 |
38061 | 105 | 0 | 581 | 158 | 2 |
37825 | 2 | 14 | 1057 | 157 | 3 |
37822 | 2 | 2 | 228 | 157 | 3 |
37606 | 2 | 2 | 1006 | 156 | 3 |
37597 | 2 | 9 | 9 | 156 | 3 |
37554 | 2 | 2 | 9 | 156 | 3 |
37390 | 2 | 99 | 9 | 156 | 3 |
37322 | 2 | 8 | 48 | 155 | 3 |
37277 | 2 | 2 | 9 | 155 | 3 |
37266 | 2 | 9 | 1068 | 155 | 3 |
37210 | 2 | 8 | 239 | 155 | 3 |
37120 | 2 | 2 | 90 | 155 | 3 |
37032 | 2 | 2 | 9 | 154 | 3 |
36993 | 2 | 8 | 28 | 154 | 3 |
36988 | 2 | 71 | 136 | 154 | 3 |
36971 | 2 | 2 | 90 | 154 | 3 |
36949 | 2 | 9 | 48 | 154 | 3 |
36896 | 2 | 29 | 9 | 154 | 3 |
36827 | 2 | 28 | 309 | 154 | 3 |
36798 | 2 | 10 | 52 | 153 | 3 |
The question is, this query is very slow in such a 100000 record table.
Is there any methods to make it quicker?
I tried to solve using recrusive temp table, but this statement never ends:
WITH RECURSIVE pre_nodes AS (
SELECT
fact_id,
from_id,
relation_id,
to_id,
link_time,
0 AS degree
FROM
icews14s
WHERE
fact_id = 69298
UNION
SELECT
o.fact_id,
o.from_id,
o.relation_id,
o.to_id,
o.link_time,
n.degree + 1
FROM
icews14s o
JOIN
pre_nodes n ON IF(n.degree = 0, (o.from_id = n.from_id AND o.link_time < n.link_time),
(o.from_id = n.to_id AND o.link_time < n.link_time))
WHERE
o.fact_id != 69298
)
SELECT distinct
fact_id,
from_id,
relation_id,
to_id,
link_time,
degree
FROM
pre_nodes
ORDER BY
fact_id DESC
LIMIT
30;
Your first query can be simplified quite a bit. Let's take the
pre_nodes_1
cte as an example:As O. Jones pointed out in the comments, the nesting of
(select * from icews14s)
is unnecessary, but the optimizer should spot this and remove the nesting. Theleft join
will be implicitly turned into aninner join
due to the criterion ont.fact_id
. Given thet.fact_id in (select fact_id from target)
, I think the cte can be rewritten as:The same pattern applies to the other two
pre_nodes_*
ctes. Given theo.link_time < t.link_time
criterion in each of the joins, it should not be necessary to dedupe theUNION
, so switching toUNION ALL
will reduce overhead by removing the dedupe step.This query will probably benefit from a composite index on
(from_id, link_time)
:For your recursive cte, I suspect the performance issue is due to the join condition. Without a reasonable amount of test data I cannot test this, but I suspect rewriting the recursive part of the cte like this may allow it to use an index: