Post optimization needed after deleting rows in a MYSQL Database

7.1k views Asked by At

I have a log table that is currently 10GB. It has a lot of data for the past 2 years, and I really feel at this point I don't need so much in there. Am I wrong to assume it is not good to have years of data in a table (a smaller table is better)?

My tables all have an engine of MYISAM.

I would like to delete all data of 2014 and 2015, and soon i'll do 2016, but i'm concerned about after I run the DELETE statement, what exactly will happen. I understand because it's ISAM there is a lock that will occur where no writing can take place? I would probably delete data by the month, and do it late at night, to minimize this as it's a production DB.

My prime interest, specifically, is this: should I take some sort of action after this deletion? Do I need to manually tell MYSQL to do anything to my table, or is MYSQL going to do all the housekeeping itself, reclaiming everything, reindexing, and ultimately optimizing my table after the 400,000k records I'll be deleting.

Thanks everyone!

2

There are 2 answers

4
Rick James On BEST ANSWER

Plan A: Use a time-series PARTITIONing of the table so that future deletions are 'instantaneous' because of DROP PARTITION. More discussion here . Partitioning only works if you will be deleting all rows older than X.

Plan B: To avoid lengthy locking, chunk the deletes. See here . This is optionally followed by an OPTIMIZE TABLE to reclaim space.

Plan C: Simply copy over what you want to keep, then abandon the rest. This is especially good if you need to preserve only a small proportion of the table.

CREATE TABLE new LIKE real;
INSERT INTO new
    SELECT * FROM real
        WHERE ... ;  -- just the newer rows;
RENAME TABLE real TO old, new TO real;   -- instantaneous and atomic
DROP TABLE old;    -- after verifying that all went well.

Note: The .MYD file contains the data; it will never shrink. Deletes will leave holes in it. Further inserts (and opdates) will use the holes in preference to growing the table. Plans A and C (but not B) will avoid the holes, and truly free up space.

0
zedfoxus On

Tim and e4c5 have given some good recommendations and I urge them to add their answers.

You can run OPTIMIZE TABLE after doing the deletes. Optimize table will help you with a few things (taken from the docs):

  • If the table has deleted or split rows, repair the table.
  • If the index pages are not sorted, sort them.
  • If the table's statistics are not up to date (and the repair could not be accomplished by sorting the index), update them.

According to the docs: http://dev.mysql.com/doc/refman/5.7/en/optimize-table.html

Use OPTIMIZE TABLE in these cases, depending on the type of table:

...

After deleting a large part of a MyISAM or ARCHIVE table, or making many changes to a MyISAM or ARCHIVE table with variable-length rows (tables that have VARCHAR, VARBINARY, BLOB, or TEXT columns). Deleted rows are maintained in a linked list and subsequent INSERT operations reuse old row positions. You can use OPTIMIZE TABLE to reclaim the unused space and to defragment the data file. After extensive changes to a table, this statement may also improve performance of statements that use the table, sometimes significantly.