I am totally new to MySQL and appreciate very much for your patience. I have the following MySQL table:
| fact_id | from_id | relation_id | to_id | link_time |
|---|---|---|---|---|
| 1 | 19 | 6 | 151 | 1 |
| 2 | 2233 | 2 | 57 | 1 |
| 3 | 182 | 23 | 112 | 1 |
| 4 | 22 | 17 | 21 | 1 |
| 5 | 3 | 8 | 742 | 1 |
| 6 | 507 | 2 | 55 | 1 |
| 7 | 154 | 25 | 56 | 1 |
| 8 | 100 | 83 | 18 | 1 |
| 9 | 1110 | 2 | 31 | 1 |
| 10 | 141 | 29 | 7 | 1 |
| ... | ... | ... | ... | ... |
with the corresponding init code:
create table icews14s
(
fact_id int auto_increment
primary key,
from_id int not null,
relation_id int not null,
to_id int not null,
link_time int not null
);
create index icews14s_from_id_index
on icews14s (from_id);
create index icews14s_link_time_index
on icews14s (link_time);
create index icews14s_to_id_index
on icews14s (to_id);
and one long query:
with target as (select fact_id, from_id, link_time from icews14s where fact_id = 69298),
pre_nodes_1 as (select o.fact_id as fact_id,
o.from_id as from_id,
o.relation_id as relation_id,
o.to_id as to_id,
o.link_time as link_time,
1 as degree
from icews14s o
left join (select * from icews14s) t
on o.from_id = t.from_id and o.link_time < t.link_time
where t.fact_id in (select fact_id from target)),
pre_nodes_2 as (select o.fact_id as fact_id,
o.from_id as from_id,
o.relation_id as relation_id,
o.to_id as to_id,
o.link_time as link_time,
2 as degree
from icews14s o
left join (select * from icews14s) t on o.from_id = t.to_id and o.link_time < t.link_time
where t.fact_id in (select fact_id from pre_nodes_1)),
pre_nodes_3 as (select o.fact_id as fact_id,
o.from_id as from_id,
o.relation_id as relation_id,
o.to_id as to_id,
o.link_time as link_time,
3 as degree
from icews14s o
left join (select * from icews14s) t on o.from_id = t.to_id and o.link_time < t.link_time
where t.fact_id in (select fact_id from pre_nodes_2))
select fact_id, from_id, relation_id, to_id, link_time, 0 as degree from icews14s where fact_id = 69298
union select * from pre_nodes_1
union select * from pre_nodes_2
union select * from pre_nodes_3
order by fact_id desc limit 30;
the corresponding result is :
| fact_id | from_id | relation_id | to_id | link_time | degree |
|---|---|---|---|---|---|
| 69298 | 1659 | 16 | 269 | 285 | 0 |
| 60977 | 1659 | 37 | 3176 | 253 | 1 |
| 58981 | 3176 | 1 | 3281 | 245 | 2 |
| 58757 | 1659 | 8 | 1884 | 245 | 1 |
| 58722 | 1659 | 0 | 1884 | 245 | 1 |
| 39282 | 1659 | 1 | 105 | 163 | 1 |
| 38740 | 105 | 7 | 1143 | 161 | 2 |
| 38570 | 105 | 29 | 1815 | 161 | 2 |
| 38440 | 105 | 19 | 2 | 160 | 2 |
| 38101 | 2 | 28 | 52 | 159 | 3 |
| 38061 | 105 | 0 | 581 | 158 | 2 |
| 37825 | 2 | 14 | 1057 | 157 | 3 |
| 37822 | 2 | 2 | 228 | 157 | 3 |
| 37606 | 2 | 2 | 1006 | 156 | 3 |
| 37597 | 2 | 9 | 9 | 156 | 3 |
| 37554 | 2 | 2 | 9 | 156 | 3 |
| 37390 | 2 | 99 | 9 | 156 | 3 |
| 37322 | 2 | 8 | 48 | 155 | 3 |
| 37277 | 2 | 2 | 9 | 155 | 3 |
| 37266 | 2 | 9 | 1068 | 155 | 3 |
| 37210 | 2 | 8 | 239 | 155 | 3 |
| 37120 | 2 | 2 | 90 | 155 | 3 |
| 37032 | 2 | 2 | 9 | 154 | 3 |
| 36993 | 2 | 8 | 28 | 154 | 3 |
| 36988 | 2 | 71 | 136 | 154 | 3 |
| 36971 | 2 | 2 | 90 | 154 | 3 |
| 36949 | 2 | 9 | 48 | 154 | 3 |
| 36896 | 2 | 29 | 9 | 154 | 3 |
| 36827 | 2 | 28 | 309 | 154 | 3 |
| 36798 | 2 | 10 | 52 | 153 | 3 |
The question is, this query is very slow in such a 100000 record table.
Is there any methods to make it quicker?
I tried to solve using recrusive temp table, but this statement never ends:
WITH RECURSIVE pre_nodes AS (
SELECT
fact_id,
from_id,
relation_id,
to_id,
link_time,
0 AS degree
FROM
icews14s
WHERE
fact_id = 69298
UNION
SELECT
o.fact_id,
o.from_id,
o.relation_id,
o.to_id,
o.link_time,
n.degree + 1
FROM
icews14s o
JOIN
pre_nodes n ON IF(n.degree = 0, (o.from_id = n.from_id AND o.link_time < n.link_time),
(o.from_id = n.to_id AND o.link_time < n.link_time))
WHERE
o.fact_id != 69298
)
SELECT distinct
fact_id,
from_id,
relation_id,
to_id,
link_time,
degree
FROM
pre_nodes
ORDER BY
fact_id DESC
LIMIT
30;
Your first query can be simplified quite a bit. Let's take the
pre_nodes_1cte as an example:As O. Jones pointed out in the comments, the nesting of
(select * from icews14s)is unnecessary, but the optimizer should spot this and remove the nesting. Theleft joinwill be implicitly turned into aninner joindue to the criterion ont.fact_id. Given thet.fact_id in (select fact_id from target), I think the cte can be rewritten as:The same pattern applies to the other two
pre_nodes_*ctes. Given theo.link_time < t.link_timecriterion in each of the joins, it should not be necessary to dedupe theUNION, so switching toUNION ALLwill reduce overhead by removing the dedupe step.This query will probably benefit from a composite index on
(from_id, link_time):For your recursive cte, I suspect the performance issue is due to the join condition. Without a reasonable amount of test data I cannot test this, but I suspect rewriting the recursive part of the cte like this may allow it to use an index: