How to delete millions of records from a mysql table without slowdown

1.6k views Asked by At

Is there a good way to delete lots of records without slowing down a website?

I need to delete millions of records from a MySQL table that has no indexes and no primary key. I read on SO and various tutorials on the web that the basic strategy is to limit the delete query, sleep for a second or two between deletes, and repeat the process until finished. I also (using PDO) am running a commit after all loops complete.

That worked fine last week, but every time I ran the script, the database slowed down and we received many complaints about the site being slow etc. This is on a Miva Merchant baskets table, not that it really matters.

I'm almost done trimming the table so I could just suffer through it and finish. But there must be a better way...?

Here is the relevant code:

$database->beginTransaction();
$selectLimit = 4900; // mysql will lock the entire table at 5000+.....
$loopLimit = 10;
$date = "1456272001"; // 2016-02-24

for( $i = 0; $i < $loopLimit; $i++ ) {
    $startTime = time();
    $oldBaskets = $database->prepare("DELETE FROM s01_Baskets WHERE CAST(lastupdate AS UNSIGNED) < '" . $date . "' LIMIT " . $selectLimit . "");
    if ( $oldBaskets->execute() ) {
        $deletes = $oldBaskets->rowCount();
        $totalDeletes += $deletes;
        $duration = time() - $startTime;
        echo "\ndeleted '" . $deletes . "' entries";
        echo "\n-- took '" . $duration . "' seconds";
    }
    sleep(2);
}
$database->commit();
2

There are 2 answers

2
Quassnoi On

Create an index on lastupdate and modify your query a little:

DELETE
FROM    s01_Baskets
WHERE   lastupdate < :date
ORDER BY
        lastupdate
LIMIT   :limit

Having an index on lastupdate will allow MySQL to use it both for ordering and filtering, so only the records which have to be deleted will be visited by the engine.

Without an index, MySQL has to examine all the records in your database as it reads them, before it reaches the limit.

Using CAST on an indexed field in MySQL makes the expression unsargable (unable to use the index for filtering), that's why you should convert the expression you are comparing against ($date), not vice versa.

0
MonkeyZeus On

Since it sounds like you have no indexes and no auto-incremented IDs I would personally go for direct SQL like this:

Note: You should probably do this when there is minimal activity on the system

RENAME TABLE s01_Baskets TO s01_Baskets_to_be_deleted;

CREATE TABLE s01_Baskets LIKE s01_Baskets_to_be_deleted;

INSERT INTO s01_Baskets (col1, col2, ..., coln)
SELECT *
FROM s01_Baskets_to_be_deleted
WHERE lastupdate >= '2016-02-24 00:00:00';

DROP TABLE s01_Baskets_to_be_deleted;

The first two should execute relatively quickly and your users will not notice a slowdown. All of their interaction will simply be routed to your new empty table.

The third command will re-insert the records your wish to keep.

As for the DROP command, it might slow down the DB a little in terms of disk I/O but since none of the records are being interacted with then your users should experience almost no slowdown.


Also, another reason that deletion is so slow and intense is because MySQL will log each row and if you have any active triggers then those must be executed before the delete can be performed.