Fastest way to count the number of occurrences of a list of items from a numpy.ndarray

66 views Asked by At

I have a histogram of an image, basically a histogram of an image is a graph that show how many times a pixel that is converted to a 0-255 value occurs in an image. With Y axis number of occurance and X axis the pixel value.

enter image description here

And what i need is the total number of pixel value from 75-125

image= cv2.imread('grade_0.jpg')
listOfNumbers = image.ravel() #generates the long list of 0-255 values from the image) type numpy.ndarray

Right now my code does this by converting the numpy.ndarray to a list and counting each values one by one

start = time.time()
numberlist = list(list0fNumbers)

sum = 0
for x in range(75,125):
    sum = sum + numberlist.count(x)
end = time.time()

print('Sum: ' + str(sum))
print('Execution time_ms: ' + str((end-start) * 10**3))

Result :

Sum: 57111
Execution time_ms: 13492.571830749512

I would be doing something like this for thousands of images and with just this image alone it took 13 seconds. It is just too ineffecient. Any recommendation on how to speed it up to about less than 10ms? I wont be just getting the sum of 75-125, but other ranges as well, e.g. 0-80,75-125,120-220,210-255. Assuming those take 13seconds too to process a single 256x256 pixel image takes about 60 seconds which i would say is a bit long even for a slow computer.

Here is a sample image:

enter image description here

2

There are 2 answers

4
mozway On BEST ANSWER

You can use simple boolean operators:

import cv2

image = cv2.imread('grade_0.jpg')

out = ((image>=75)&(image<125)).sum()

# 57032

Or, as suggested by @jared:

out = np.count_nonzero((image>=75)&(image<125))

Timing of the count:

# sum
170 µs ± 2.81 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

# count_non_zero
47.6 µs ± 2.94 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Edit: I realize you want to process several bins, this could be done using:

bins = [(0,80),(75,125),(120,220),(210,255)]
out = {f'{a}-{b}': np.count_nonzero((image>=a)&(image<b)) for a, b in bins}
# {'0-80': 26274, '75-125': 57032, '120-220': 86283, '210-255': 40967}

But this will read again the image's data for each bin.

In this case, bincount, as suggested by @Andrej, could indeed preferred since it only counts the pixels once:

bins = [(0,80),(75,125),(120,220),(210,255)]

counts = np.bincount(image.ravel())
out = {'-'.join(map(str, t)): counts[slice(*t)].sum() for t in bins}
# {'0-80': 26274, '75-125': 57032, '120-220': 86283, '210-255': 40967}

The timings will depends on the size of the image and the number of bins. For small images, counting again might be more efficient, while for large ones bincount could be better (but, surprisingly, not always).

256 x 256

# count_nonzero in loop
198 µs ± 8.75 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

# bincount
440 µs ± 6.82 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

512 x 512:

# count_nonzero in loop
918 µs ± 31 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

# bincount
1.76 ms ± 26.1 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

1024 x 1024:

# count_nonzero in loop
11 ms ± 210 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# bincount
8.15 ms ± 437 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

2048 x 2048:

# count_nonzero in loop
47.1 ms ± 3.01 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# bincount
48.8 ms ± 3.02 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
1
Andrej Kesely On

You can use np.bincount:

y = np.bincount(arr)
print(y[75:125].sum())

Prints:

57032

Full code:

import numpy as np
from PIL import Image

# Open your image file:
image_path = "image.png"
image = Image.open(image_path)

arr = np.array(image).ravel()

y = np.bincount(arr)
print(y[75:125].sum())