How to check the integrity of data backups

2.6k views Asked by At

I have important data that I am backing up locally. I have a working copy of the data on one local hard drive and then a backup on a external hard drive (and then another copy on a server offsite).

Is there a good way for me to do a regular (e.g. weekly) scan of the data integrity to make sure that nothing has become corrupted on either drive?

1) a disk util scan (eg fsck) will check the drive, but not necessarily the data. 2) an rsync would tell if the versions are different. 3) a hash on 'completed' folders?

Any other better ways to check the integrity of the data on disks to make sure that nothing has been corrupted?

1

There are 1 answers

0
Patrick Tucci On BEST ANSWER

I had a similar concern a while ago with my backups. I found that I could easily back up data, but ensuring that both the backup and the original file were valid, then replacing either if they became corrupted, was a chore. I developed a C# app to do this for me, but it was cumbersome and not all that efficient.

In the end, I ended up moving to a NAS4Free based NAS with a ZFS mirror. ZFS has a strong focus on data integrity. ZFS does block level checksums and stores these checksums separately from the files. If you create a ZFS mirror, the data will be present in two (or more) places and you can scrub the mirror. The scrub goes through each block in the mirror and ensure the data matches the checksum. If it does not, it grabs the valid block from the other location(s) in the mirror.

This will take care of your local data. As far as your offsite data, if you can't create a ZFS storage solution off site, you'll probably have to archive and checksum the offsite back up before shipping it out, then check integrity as often as required. I back up all my files as encrypted archives to Amazon Glacier and catalog the checksums in case I need to grab something from the backup later.

There are many possible ways you can go about this, but I found a ZFS backed storage solution to be the easiest, most transparent and least amount of maintenance. I hope this helps, or that it at least points you in a helpful direction.

Nas4Free

ZFS

ZFS Mirrors