Fast and efficient way to detect if two images are visually identical in Python

16.7k views Asked by At

Given two images:

image1.jpg
image2.jpg

What's a fast way to detect if they are visually identical in Python? For example, they may have different EXIF data which would yield different checksums, even though the image data is the same).

Imagemagick has an excellent tool, "identify," that produces a visual hash of an image, but it's very processor intensive.

5

There are 5 answers

0
Kurt Pfeifle On

I'm still submitting my way to tackle this -- even if the OP says that ImageMagick's way is too processor intensive (and even though my way does not involve Python)... Maybe my answer is useful to other people then, arriving at this page via search engine.

Be aware that any image comparison which is supposed to discover fine differences in hi-res images is more processor intensive than a discovery of big differences in low-res images, as it has to compare a lot more pixels.

Visualization of Differences

Here is an ImageMagick command that compares two (same-sized!) images, and returns all differing pixels as red, identical pixels as white. The first one has the reference image as a faded out background image for the composition of the red-white pixel matrix. .img may be any of the IM-supported formats (.png, .PnG, .pNG, .PNG, .jpg, .jpeg, .jPeG, .tif, .tiff, .ppm, .gif, .pdf, ...):

 compare reference.img similar.img  delta.img
 compare reference.img similar.img  -compose src delta.img

By default, the comparison is made at 72 PPI. If you need more resolution (like, with a vector based image, such as a PDF page), you can add -density to increase it. Of course, the processing time will increase accordingly:

 compare -density 300 reference.img similar.img delta.img

If you add a fuzz factor, you can tell ImageMagick to treat all pixels as identical which are no more than a certain color distance apart:

 compare -fuzz '3%' reference.img similar.img -compose src delta.img

pHash-ed difference value

More recent versions of ImageMagick support the phash algorithm:

 compare -metric phash reference.img similar.img -compose src delta.img

This will, besides creating the delta.img for visualization, return a numeric value that indicates the "difference" between two images. The closer it is to 0, the more similar are the two images compared.

Examples:

Create a few small PDF pages with minor differences in them. I'm using Ghostscript:

gs -o ref1.pdf -sDEVICE=pdfwrite -g1050x1350 \
 -c "/Courier findfont 160 scalefont setfont 10.0 10.0 moveto (0) show showpage"

gs -o ref2.pdf -sDEVICE=pdfwrite -g1050x1350 
 -c "/Courier findfont 160 scalefont setfont 10.1 10.1 moveto (0) show showpage"

gs -o ref3.pdf -sDEVICE=pdfwrite -g1050x1350 \
 -c "/Courier findfont 160 scalefont setfont 10.0 10.0 moveto (O) show showpage"

gs -o ref4.pdf -sDEVICE=pdfwrite -g1050x1350 \
 -c "/Courier findfont 160 scalefont setfont 10.1 10.1 moveto (O) show showpage"

Now compare ref1.pdf with ref3.pdf at the default resolution of 72 PPI:

compare -metric phash ref1.pdf ref3.pdf delta-ref1-ref3.pdf
  7.61662

The returned pHash value is 7.61662. This indicates that ImageMagick's compare discovered at least some differences.

Let's look at the visualization. I'll create a side-by-side visualization of the three PDFs/images (to be shown below):

convert                                    \
   -mattecolor blue                        \
      \( ref1.pdf -frame 2x2 \)            \
    null:                                  \
      \( ref3.pdf -frame 2x2 \)            \
    null:                                  \
      \( delta-ref1-ref3.pdf -frame 2x2 \) \
   +append                                 \
    ref1-ref3-delta.png 

Visualization of differences: <code>ref1.pdf</code> (right), <code>ref3.pdf</code> (center) and <code>ref1-ref3-delta.png</code> (right)

As you can see, the different shapes of the 0 (digit 'zero') and the O (letter o, capital version) are standing out quite well.

Now the next one: where ref1.pdf is compared to ref2.pdf, also at 72 PPI.

compare -metric phash ref1.pdf ref2.pdf delta-ref1-ref2.pdf
  0

The returned pHash value now is 0. This indicates that ImageMagick discovered no difference!

Create a side-by-side visualization of the three PDFs/images:

convert                                    \
   -mattecolor blue                        \
      \( ref1.pdf -frame 2x2 \)            \
    null:                                  \
      \( ref2.pdf -frame 2x2 \)            \
    null:                                  \
      \( delta-ref1-ref2.pdf -frame 2x2 \) \
   +append                                 \
    ref1-ref2-delta.png 

Visualization of differences: <code>ref1.pdf</code> (right), <code>ref2.pdf</code> (center) and <code>ref1-ref2-delta.png</code> (right)

As you can see, at 72 PPI ImageMagick does not discover a difference between the two PDFs (as would be indicated by red pixels). According to the Ghostscript command, both show the digit 0, but at positions which are shifted by 0.1 pt apart in x- and y-directions. So in reality, in the original PDF, there IS a difference. But when rendered at 72 PPI, this difference isn't visible.

Let's try to see the difference with density 600 then:

compare        \
 -metric phash \
 -density 600  \
  ref1.pdf     \
  ref2.pdf     \
  ref1-ref2-at-density600-delta.png 

0.00172769

The returned pHash value at 600 PPI now is 0.00172769. This is close to zero, but still a difference. The difference is less than the one between ref1.pdf and ref3.pdf.

The difference is clearly highlighted now in the visual comparison, even though only by a thin line of red pixels:

1
RodrigoOlmo On

Using PIL/Pillow:

from PIL import Image

im1 = Image.open('image1.jpg')
im2 = Image.open('image2.jpg')

if list(im1.getdata()) == list(im2.getdata()):
    print "Identical"
else:
    print "Different"
0
fmw42 On

One way to do that in Python/OpenCV is to get the absdiff, then get the mean (average) of the absdiff over the whole absdiff image.

Input1 (PNG):

enter image description here

Input2 (JPG):

enter image description here

import cv2
import numpy as np

# read image 1
img1 = cv2.imread('lena.png')

# read image 2
img2 = cv2.imread('lena.jpg')

# do absdiff
diff = cv2.absdiff(img1,img2)

# get mean of absdiff
mean_diff = np.mean(diff)

# print result
print(mean_diff)

1.8992767333984375
0
jcupitt On

Just because no one has mentioned it yet, Spatial CIELAB is another useful image similarity metric.

It's simpler than it sounds: you blur the two images by an amount related to the acuity of your observer, then find the CIELAB difference (delta E). You can take the peak or average of the difference image, depending on your application.

Using pyvips, you could write:

#!/usr/bin/python3

import sys
import pyvips

# the access hint means these images can be streamed in parallel rather 
# than fully decoded
image1 = pyvips.Image.new_from_file(sys.argv[1], access="sequential")
image2 = pyvips.Image.new_from_file(sys.argv[2], access="sequential")

# blur by an amount related to the visual acuity of the observer -- this will
# help remove peaks caused by small alignment differences, then take the
# CIELAB76 colour difference
sigma = 3.0
# diff = image1.gaussblur(sigma).dE76(image2.gaussblur(sigma))
diff = image1.resize(1.0 / sigma).dE76(image2.resize(1.0 / sigma))

# compute the peak difference ... over perhaps 20 means a visible difference
print(f"peak difference of {diff.max()} visual units")

As a small optimization, resizing rather than blurring reduces the number of pixels you need to compute the colour difference for.

This PC will compute a difference for a pair of 6k x 4k JPGs in about 400ms.

$ vipsheader ~/pics/theo.jpg
/home/john/pics/theo.jpg: 6048x4032 uchar, 3 bands, srgb, jpegload
$ time ./try51.py ~/pics/theo2.jpg ~/pics/theo.jpg
peak difference of 0.0 visual units
real    0m0.396s
user    0m0.952s
sys 0m0.197s
0
Xuemin LU On

Using https://github.com/andrewekhalel/sewar to compare image similar

> from sewar.full_ref import uqi
> uqi(img1,img2)
0.9586952304831419