I have a table named "article" containing 1,145,141 records with various fields including
"id," "uiqPID," "tenant," "status," "title," "body," "user_id," "category_id," "created_at," and "updated_at."
The "status" column can have one of three values: 'PUBLISHED,' 'DRAFT,' or 'ARCHIVED,' with the following counts:
- PUBLISHED: 2
- DRAFT: 26,145
- ARCHIVED: 1,118,993
I have set up the following indexes:
- PRIMARY index on 'id'
- article_abstract_unq_id index on 'uiqPID'
- article_abstract_unq_id index on 'tenant'
- article_status_idx index on 'status'
- idx_composite_search index on 'id'
- idx_composite_search index on 'uiqPID'
- idx_composite_search index on 'created_at'
My issue is with the performance of the following query, which took 5.7 seconds to execute:
SELECT
a.id AS id,
a.created_at AS created_at
FROM
article a
WHERE
a.status = 'ARCHIVED'
ORDER BY a.created_at DESC
LIMIT 50;
However, if I remove the WHERE condition or change it to a.status = 'DRAFT', the query completes within 1 second.
Upon inspecting the query plan, I noticed the difference in execution strategies. With the 'ARCHIVED' or 'DRAFT' status filter, the plan shows:
key: article_status_idx
Extra: Using index condition; Using filesort
But without the 'ARCHIVED' filter, the plan simply states:
key:
Extra: Using filesort
My question is: How can I optimize the query performance for filtering on 'ARCHIVED' status, ensuring it executes faster than the current 5.7 seconds, similar to the queries without this condition or with 'DRAFT' status?
The behavior you experienced is actually what I would expect. writing 1,118,993 number-timestamp tuples into memory takes time, especially if you order the results by
created_at. You need to test the following:and if the time to execute this query is roughly the same as executing the query with the where clause, then it's not the where clause that causes the performance drop, but ordering the query + loading the results into memory. Run the following as well:
Here we do not even have an order by. If this is similarly slow, then the thing you mainly wait for is to write this stuff into memory.
Anyway, do these measurements and the result will either confirm that the slowness is caused by the where or order by clause, or not. If these clauses cause the slowness, then let me know in the comment section and I will provide ways to optimize that issue. But it's highly probable that the real problem is that you are waiting for the full query to be executed.
Maybe it's better to break the query into partitions and run queries with limit 0, 10000 and then limit 10000, 10000 and so on, so you can work with the first results already while waiting for the others, maybe reducing the frustration of impatient users. But that does not change the fact that loading all this stuff takes time, even though you can make this waiting user-friendlier if the showing of partial results makes some useful sense.
EDIT
You can try creating a (status, created_at) key before you create a copy table, maybe such indexing will be useful for your performance, the idea being to have an indexed order for created_at for each status type.
If all else fails, you could create a table like this:
And then you can select from
article_archive, like:You can even join that with
articlebyid.