Calculating distances between unique Python array regions?

2.3k views Asked by At

I have a raster with a set of unique ID patches/regions which I've converted into a two-dimensional Python numpy array. I would like to calculate pairwise Euclidean distances between all regions to obtain the minimum distance separating the nearest edges of each raster patch. As the array was originally a raster, a solution needs to account for diagonal distances across cells (I can always convert any distances measured in cells back to metres by multiplying by the raster resolution).

I've experimented with the cdist function from scipy.spatial.distance as suggested in this answer to a related question, but so far I've been unable to solve my problem using the available documentation. As an end result I would ideally have a 3 by X array in the form of "from ID, to ID, distance", including distances between all possible combinations of regions.

Here's a sample dataset resembling my input data:

import numpy as np
import matplotlib.pyplot as plt

# Sample study area array
example_array = np.array([[0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0],
                          [0, 0, 2, 0, 2, 2, 0, 6, 0, 3, 3, 3],
                          [0, 0, 0, 0, 2, 2, 0, 0, 0, 3, 3, 3],
                          [0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 3, 0],
                          [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3],
                          [1, 1, 0, 0, 0, 0, 0, 0, 3, 3, 3, 3],
                          [1, 1, 1, 0, 0, 0, 3, 3, 3, 0, 0, 3],
                          [1, 1, 1, 0, 0, 0, 3, 3, 3, 0, 0, 0],
                          [1, 1, 1, 0, 0, 0, 3, 3, 3, 0, 0, 0],
                          [1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                          [1, 0, 1, 0, 0, 0, 0, 5, 5, 0, 0, 0],
                          [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4]])

# Plot array
plt.imshow(example_array, cmap="spectral", interpolation='nearest')

Example array with numbered regions

1

There are 1 answers

1
rth On BEST ANSWER

Distances between labeled regions of an image can be calculated with the following code,

import itertools
from scipy.spatial.distance import cdist

# making sure that IDs are integer
example_array = np.asarray(example_array, dtype=np.int) 
# we assume that IDs start from 1, so we have n-1 unique IDs between 1 and n
n = example_array.max()

indexes = []
for k in range(1, n):
    tmp = np.nonzero(example_array == k)
    tmp = np.asarray(tmp).T
    indexes.append(tmp)

# calculating the distance matrix
distance_matrix = np.zeros((n-1, n-1), dtype=np.float)   
for i, j in itertools.combinations(range(n-1), 2):
    # use squared Euclidean distance (more efficient), and take the square root only of the single element we are interested in.
    d2 = cdist(indexes[i], indexes[j], metric='sqeuclidean') 
    distance_matrix[i, j] = distance_matrix[j, i] = d2.min()**0.5

# mapping the distance matrix to labeled IDs (could be improved/extended)
labels_i, labels_j = np.meshgrid( range(1, n), range(1, n))  
results = np.dstack((labels_i, labels_j, distance_matrix)).reshape((-1, 3))

print(distance_matrix)
print(results)

This assumes integer IDs, and would need to be extended if that is not the case. For instance, with the test data above, the calculated distance matrix is,

# From  1             2         3            4              5         # To
[[  0.           4.12310563   4.           9.05538514   5.        ]   # 1
 [  4.12310563   0.           3.16227766  10.81665383   8.24621125]   # 2
 [  4.           3.16227766   0.           4.24264069   2.        ]   # 3 
 [  9.05538514  10.81665383   4.24264069   0.           3.16227766]   # 4
 [  5.           8.24621125   2.           3.16227766   0.        ]]  # 5

while the full output can be found here. Note that this takes the Eucledian distance from the center of each pixel. For instance, the distance between zones 1 and 3 is 2.0, while they are separated by 1 pixel.

This is a brute-force approach, where we calculate all the pairwise distances between pixels of different regions. This should be sufficient for most applications. Still, if you need better performance, have a look at scipy.spatial.cKDTree which would be more efficient in computing the minimum distance between two regions, when compared to cdist.