In one of our community sites, we allow users to upload images. These images are approved or rejected by our moderators.
To limit the work needed by our administrators, we want to 'log' each picture that is rejected to some kind of database, and do a lookup in this database prior to submitting an image for approval. If a similar image already has been rejected, the uploaded image won't be submitted for approval.
We can of course just log stuff like filename, size and MD5 of the picture for similarity, but it would definitely we could find differently cropped or resized images.
TinEye.com provides a similar functionality.
Do you know any kind of open-source software capable of this? Do you have any other ideas?
Thanks!
To detect resized and lossily compressed images you could resize the image to some standard size (like 40x40px) and then subtract the known image from the new image and compare the distance to a threshold.
Unfortunately this doesn't work with rotation or cropping. In that case you'd need to extract scale invariant features of the image.
Another problem of this approach is that with a naive implementation the computational cost is linear in the size of the list of known images, so it might get too expensive quickly to compare the new image against all old images.