I am writting an app to interactively analyze 2D scatter plots with millions of points. To reduce the number of points to plot and prevent reinventing the wheel, I use 2D hexagonal binning from matplotlib. (density plot)
The classical usage is:
import numpy as np
from matplotlib import pyplot as plt
x=np.random.normal(loc=100,scale=20,size=100)
y=np.random.normal(loc=1000,scale=20,size=100)
fig, ax = plt.subplots(figsize=(10, 8))
hexbin = ax.hexbin(x, y, gridsize=25, cmap='jet',picker=True)
plt.show()
For my app, I need to know which points belong to each bin.
For this, I have two options:
- use shapely : extract the shape of each hexagon and use shapely to find whether the point is included or not . Here is my proof of concept (of course some optimisation can still be done to prevent testing points already allocated):
import shapely
# get reference hexagon shape
paths = hexbin.get_paths()
hexagon_vertices = [item[0] for item in paths[0].iter_segments()]
# get offset to apply to each hexagon
list_offsets = hexbin.get_offsets()
list_points = [shapely.Point(x,y) for x,y in zip(x,y)]
list_idx = []
for offset in list_offsets:
polygon = shapely.Polygon(shell = [corner + offset for corner in hexagon_vertices])
is_in = polygon.contains(list_points)
idx = np.argwhere(is_in==1).flatten()
if len(idx)==0:
idx = None
list_idx.append(idx)
But I think this is an overkill as behind the scene, this info is already computed somehow by matplotlib. I had the look at the code of the hexbin function of matplotlib but I am a little bit lost... The key part seems to be here: https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/axes/_axes.py#L5061-L5062 but I would need some help to be able to extract the usefull info...
- adapt the matplotlib code to extract this info : for this I would need some help :-)
I hope that someone will be able to help me.
Thanks a lot in advance,
Patrick