Python - Plot a specific subset of large datasets using Holoviews Bokeh

408 views Asked by At

What I am trying to do is to create an interactive plot with Pan and Zoom in/out interactions for large time series.

Consider the next case in Jupyter Notebook:

import numpy as np

import holoviews as hv
import holoviews.plotting.bokeh
from holoviews.operation import decimate

hv.extension('bokeh')

n_samples = 1_000 #100_000_000

x = np.linspace(0.0,10.0, n_samples)

y = np.zeros((64, n_samples))
r = np.random.rand(n_samples)

for i in range(64):
    y[i] = np.sin(r + np.random.rand(n_samples)*0.3)+i


curves = hv.Curve( (zip(x,y[0,:])) ).opts(height=400, width=800)
for i in range(1,64):
    curves *= hv.Curve( (zip(x,y[i,:])) ) 

curves = curves.options({'Curve': {'color': 'black'}})

curves = decimate(curves).collate()

curves.redim(x=hv.Dimension('x', range=(0, 2)))

Using n_samples=1_000 goes well but the current number of samples is ~10-100 million of points, so it goes super slow.

I think this happens because it creates all the graphical elements and stores them in memory. Then, when I change the range in x using the Pan tool, it has to search which of all the elements need to be plotted, and that is the slow part.

If that's the case, a solution may be to plot only a subset of 1k-5k points from the arrays considering the ranges of the canvas. I don't need all the points on the canvas, so they can be computed on the fly.

Do you know another way to solve this issue? I am new using Bokeh and Holoviews.

Thanks!

1

There are 1 answers

0
James A. Bednar On

Sure. As suggested by Sander, you can use Datashader to render your data outside of the browser for speed and efficiency. Once you have defined curves, just do:

import holoviews.operation.datashader as hd

hd.rasterize(curves)  # as the last line in your Jupyter notebook cell

You shouldn't need to decimate.