Python numpy array -- close smallest regions

465 views Asked by At

I have a 2D boolean numpy array that represents an image, on which I call skimage.measure.label to label each segmented region, giving me a 2D array of int [0,500]; each value in this array represents the region label for that pixel. I would like to now remove the smallest regions. For example, if my input array is shape (n, n), I would like all labeled regions of < m pixels to be subsumed into the larger surrounding regions. For example if n=10 and m=5, my input could be,

0, 0, 0, 0, 0, 0, 0, 1, 1, 1
0, 0, 0, 0, 0, 0, 0, 1, 1, 1
0, 0, 7, 8, 0, 0, 0, 1, 1, 1
0, 0, 0, 0, 0, 0, 0, 1, 1, 1
0, 0, 0, 0, 0, 2, 2, 2, 1, 1
4, 4, 4, 4, 2, 2, 2, 2, 1, 1
4, 6, 6, 4, 2, 2, 2, 3, 3, 3
4, 6, 6, 4, 5, 5, 5, 3, 3, 5
4, 4, 4, 4, 5, 5, 5, 5, 5, 5
4, 4, 4, 4, 5, 5, 5, 5, 5, 5

and the output is then,

0, 0, 0, 0, 0, 0, 0, 1, 1, 1
0, 0, 0, 0, 0, 0, 0, 1, 1, 1
0, 0, 0, 0, 0, 0, 0, 1, 1, 1  # 7 and 8 are replaced by 0
0, 0, 0, 0, 0, 0, 0, 1, 1, 1
0, 0, 0, 0, 0, 2, 2, 2, 1, 1
4, 4, 4, 4, 2, 2, 2, 2, 1, 1
4, 4, 4, 4, 2, 2, 2, 3, 3, 3  # 6 is gone, but 3 remains
4, 4, 4, 4, 5, 5, 5, 3, 3, 5
4, 4, 4, 4, 5, 5, 5, 5, 5, 5
4, 4, 4, 4, 5, 5, 5, 5, 5, 5

I've looked into skimage morphology operations, including binary closing, but none seem to work well for my use case. Any suggestions?

1

There are 1 answers

2
Jonas Adler On BEST ANSWER

You can do this by performing a binary dilation on the boolean region corresponding to each label. By doing this you will find the number of neighbours for each region. Using this you can then replace values as needed.

For an example code:

import numpy as np
import scipy.ndimage

m = 5

arr = [[0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
       [0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
       [0, 0, 7, 8, 0, 0, 0, 1, 1, 1],
       [0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
       [0, 0, 0, 0, 0, 2, 2, 2, 1, 1],
       [4, 4, 4, 4, 2, 2, 2, 2, 1, 1],
       [4, 6, 6, 4, 2, 2, 2, 3, 3, 3],
       [4, 6, 6, 4, 5, 5, 5, 3, 3, 5],
       [4, 4, 4, 4, 5, 5, 5, 5, 5, 5],
       [4, 4, 4, 4, 5, 5, 5, 5, 5, 5]]
arr = np.array(arr)
nval = np.max(arr) + 1

# Compute number of occurances of each number
counts, _ = np.histogram(arr, bins=range(nval + 1))

# Compute the set of neighbours for each number via binary dilation
c = np.array([scipy.ndimage.morphology.binary_dilation(arr == i)
              for i in range(nval)])

# Loop over the set of arrays with bad count and update them to the most common
# neighbour
for i in filter(lambda i: counts[i] < m, range(nval)):
    arr[arr == i] = np.argmax(np.sum(c[:, arr == i], axis=1))

Which gives the expected result:

>>> arr.tolist()
[[0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
 [0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
 [0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
 [0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
 [0, 0, 0, 0, 0, 2, 2, 2, 1, 1],
 [4, 4, 4, 4, 2, 2, 2, 2, 1, 1],
 [4, 4, 4, 4, 2, 2, 2, 3, 3, 3],
 [4, 4, 4, 4, 5, 5, 5, 3, 3, 5],
 [4, 4, 4, 4, 5, 5, 5, 5, 5, 5],
 [4, 4, 4, 4, 5, 5, 5, 5, 5, 5]]