When creating a graph of calculations using delayed
I'm trying to assign names so that if I visualize the graph it's readable. However, for delayed variables that are dependent on functions the name
parameter doesn't seem to affect the key. Here's a toy example:
def calc_avg(a, b):
return pd.concat([a, b], axis=1).mean(axis=1)
def calc_ratio(a, b):
return a / b
a = delayed(pd.Series(np.random.rand(10)), name='a')
b = delayed(pd.Series(np.random.rand(10)), name='b')
c = delayed(pd.Series(np.random.rand(10)), name='c')
x = delayed(calc_avg, name='avg_result')(a,b)
y = delayed(calc_ratio, name='ratio_result')(x,c)
y.visualize()
You can see the visualization here (I can't embed images), but rather than seeing 'avg_result' I see 'calc_avg-#0' and rather than see 'ratio_result' I see 'calc_ratio-#1'. If I look at x.key
or y.key
they do not match the names that I provided. Is this the expected behavior?
The key of a dask result needs to be unique for every combination of the function that was delayed, and the inputs you give it. What you see above is the expected behaviour: you are naming the function, but a call with different inputs would expect a different output, so the key must be different.
You can specify the key you'd like associated not when you define the delayed function, but when you call it: