I am trying to render a line plot with a dataframe of ~25000 time series samples (each with 226 time steps). I have had success using this guide on plotting large datasets with Holoviews NdOverlay:
https://holoviews.org/user_guide/Large_Data.html (specifically the multidimensional plots section)
import datashader as ds
import holoviews as hv
import holoviews.operation.datashader as hd
lines = {i: hv.Curve(cp.asarray(df.iloc[i].values)) for i in range(len(df))}
lineoverlay = hv.NdOverlay(lines)
plot = hd.datashade(lineoverlay, pixel_ratio=1, line_width=1, aggregator=ds.count()).opts(width=600)
I then take the mean of the dataframe and use hv.Curve to plot this mean = hv.Curve((x, df_mean.values[0]))
and overlay the two plots.
plot2 = hv.Overlay([plot, mean]).collate()
The issue is it takes an extremely long time to render so I thought I would try using the Rapids AI gpu accelerated cuDF and cupy packages. Regardless of the method I try both seems to fail on the dictionary construction.
import cuml
import cupy as cp
import cudf
df = cudf.from_pandas(df)
lines = {i: hv.Curve(df.iloc[i].values) for i in range(len(df))}
or
lines = {i: hv.Curve(cp.asarray(df.iloc[i].values)) for i in range(len(df))}
both give the error:
holoviews.core.data.interface.DataError: None of the available storage backends were able to support the supplied data format.
Any input or other ways of going about the problem would be appreciated!
There is a way to get it work, via converting the lines to a single cudf dataframe, as that's the format hv.Curve seems to work with.
Following is one way you could plot the requested charts using cudf and holoviews.
I am using the
time_series()
function to generate data mentioned on https://holoviews.org/user_guide/Large_Data.html.Generating the overlaying mean line: