Using Holoviews NdOverlay with cuDF or cupy

35 views Asked by At

I am trying to render a line plot with a dataframe of ~25000 time series samples (each with 226 time steps). I have had success using this guide on plotting large datasets with Holoviews NdOverlay:

https://holoviews.org/user_guide/Large_Data.html (specifically the multidimensional plots section)

import datashader as ds
import holoviews as hv
import holoviews.operation.datashader as hd

lines = {i: hv.Curve(cp.asarray(df.iloc[i].values)) for i in range(len(df))}
lineoverlay = hv.NdOverlay(lines)
plot = hd.datashade(lineoverlay, pixel_ratio=1, line_width=1, aggregator=ds.count()).opts(width=600) 

I then take the mean of the dataframe and use hv.Curve to plot this mean = hv.Curve((x, df_mean.values[0])) and overlay the two plots.

plot2 = hv.Overlay([plot, mean]).collate()

The issue is it takes an extremely long time to render so I thought I would try using the Rapids AI gpu accelerated cuDF and cupy packages. Regardless of the method I try both seems to fail on the dictionary construction.

import cuml
import cupy as cp
import cudf

df = cudf.from_pandas(df)
lines =  {i: hv.Curve(df.iloc[i].values) for i in range(len(df))}

or

lines = {i: hv.Curve(cp.asarray(df.iloc[i].values)) for i in range(len(df))}

both give the error:

holoviews.core.data.interface.DataError: None of the available storage backends were able to support the supplied data format.

Any input or other ways of going about the problem would be appreciated!

1

There are 1 answers

0
AJ_ On

There is a way to get it work, via converting the lines to a single cudf dataframe, as that's the format hv.Curve seems to work with.

Following is one way you could plot the requested charts using cudf and holoviews.

I am using the time_series() function to generate data mentioned on https://holoviews.org/user_guide/Large_Data.html.

df = cudf.DataFrame({"x"+str(i): cudf.Series(time_series(N=10000, S0=200+np.random.rand())) for i in range(num_ks)})
lines = {i: hv.Curve(df[[i]].reset_index()) for i in df.columns}
lineoverlay = hv.NdOverlay(lines, kdims='k')

enter image description here

Generating the overlaying mean line:

mean = hv.Curve(cudf.DataFrame({'x':df.mean(axis=1)}).reset_index())
hv.Overlay([lineoverlay, mean]).collate().opts(width=800)

enter image description here