Given two images:
image1.jpg
image2.jpg
What's a fast way to detect if they are visually identical in Python? For example, they may have different EXIF data which would yield different checksums, even though the image data is the same).
Imagemagick has an excellent tool, "identify," that produces a visual hash of an image, but it's very processor intensive.
I'm still submitting my way to tackle this -- even if the OP says that ImageMagick's way is too processor intensive (and even though my way does not involve Python)... Maybe my answer is useful to other people then, arriving at this page via search engine.
Be aware that any image comparison which is supposed to discover fine differences in hi-res images is more processor intensive than a discovery of big differences in low-res images, as it has to compare a lot more pixels.
Visualization of Differences
Here is an ImageMagick command that compares two (same-sized!) images, and returns all differing pixels as red, identical pixels as white. The first one has the reference image as a faded out background image for the composition of the red-white pixel matrix.
.img
may be any of the IM-supported formats (.png, .PnG, .pNG, .PNG, .jpg, .jpeg, .jPeG, .tif, .tiff, .ppm, .gif, .pdf, ...):By default, the comparison is made at 72 PPI. If you need more resolution (like, with a vector based image, such as a PDF page), you can add
-density
to increase it. Of course, the processing time will increase accordingly:If you add a fuzz factor, you can tell ImageMagick to treat all pixels as identical which are no more than a certain color distance apart:
pHash-ed difference value
More recent versions of ImageMagick support the
phash
algorithm:This will, besides creating the
delta.img
for visualization, return a numeric value that indicates the "difference" between two images. The closer it is to0
, the more similar are the two images compared.Examples:
Create a few small PDF pages with minor differences in them. I'm using Ghostscript:
Now compare
ref1.pdf
withref3.pdf
at the default resolution of 72 PPI:The returned pHash value is
7.61662
. This indicates that ImageMagick'scompare
discovered at least some differences.Let's look at the visualization. I'll create a side-by-side visualization of the three PDFs/images (to be shown below):
As you can see, the different shapes of the
0
(digit 'zero') and theO
(lettero
, capital version) are standing out quite well.Now the next one: where
ref1.pdf
is compared toref2.pdf
, also at 72 PPI.The returned pHash value now is
0
. This indicates that ImageMagick discovered no difference!Create a side-by-side visualization of the three PDFs/images:
As you can see, at 72 PPI ImageMagick does not discover a difference between the two PDFs (as would be indicated by red pixels). According to the Ghostscript command, both show the digit
0
, but at positions which are shifted by 0.1 pt apart in x- and y-directions. So in reality, in the original PDF, there IS a difference. But when rendered at 72 PPI, this difference isn't visible.Let's try to see the difference with
density 600
then:The returned pHash value at 600 PPI now is
0.00172769
. This is close to zero, but still a difference. The difference is less than the one betweenref1.pdf
andref3.pdf
.The difference is clearly highlighted now in the visual comparison, even though only by a thin line of red pixels: