Converting a correlateion coefficient function from NumPy to Dask

250 views Asked by At

I'm trying to evaluate dask by converting a method from thunder (using Spark), to the equivalent numpy version, but I'm not sure how to write this using dask/distributed.

In thunder, I can take a stack of images, convert it to a series, and correlate against some signal:

imgs = thunder.images.fromrandom((10, 900, 900))
series = imgs.toseries()
signal = series[5, 5, :]
correlated = series.correlate(signal)

The numpy version looks like this:

series = numpy.random.rand(900, 900, 10)
signal = series[5, 5, :]

reshaped = series.reshape(900 * 900, 10)

correlated = numpy.asarray(
    map(lambda x: numpy.corrcoef(x, signal)[0, 1], reshaped))
)
final = correlated.reshape(900, 900)

I'm looking for some tips on how to convert this into something for distributed in particular.

1

There are 1 answers

3
MRocklin On

Perhaps something like the following?

import dask.array as da
import numpy as np

imgs = da.random.random((10, 900, 900), chunks=(1, 900, 900))
reshaped = imgs.reshape((10, 900 * 900))

If you wanted to correlate your images against each other

result = da.corrcoef(reshaped)
result.compute()

Or against some other signal

signal = np.random.random(900 * 900)
result = reshaped.map_blocks(np.corrcoef, signal, dtype=signal.dtype)
result.compute()

However, I'm not very familiar with your application, so the response above may be flawed.