What I am trying to do is to create an interactive plot with Pan and Zoom in/out interactions for large time series.
Consider the next case in Jupyter Notebook:
import numpy as np
import holoviews as hv
import holoviews.plotting.bokeh
from holoviews.operation import decimate
hv.extension('bokeh')
n_samples = 1_000 #100_000_000
x = np.linspace(0.0,10.0, n_samples)
y = np.zeros((64, n_samples))
r = np.random.rand(n_samples)
for i in range(64):
y[i] = np.sin(r + np.random.rand(n_samples)*0.3)+i
curves = hv.Curve( (zip(x,y[0,:])) ).opts(height=400, width=800)
for i in range(1,64):
curves *= hv.Curve( (zip(x,y[i,:])) )
curves = curves.options({'Curve': {'color': 'black'}})
curves = decimate(curves).collate()
curves.redim(x=hv.Dimension('x', range=(0, 2)))
Using n_samples=1_000
goes well but the current number of samples is ~10-100 million of points, so it goes super slow.
I think this happens because it creates all the graphical elements and stores them in memory. Then, when I change the range in x using the Pan tool, it has to search which of all the elements need to be plotted, and that is the slow part.
If that's the case, a solution may be to plot only a subset of 1k-5k points from the arrays considering the ranges of the canvas. I don't need all the points on the canvas, so they can be computed on the fly.
Do you know another way to solve this issue? I am new using Bokeh and Holoviews.
Thanks!
Sure. As suggested by Sander, you can use Datashader to render your data outside of the browser for speed and efficiency. Once you have defined
curves
, just do:You shouldn't need to decimate.