Pandas/Python: 2D histogram fails with value error

2.4k views Asked by At

I am trying to create a 2D histrogram from a Pandas data frame "rates" The X and Y axis are supposed to be transforms from the dataframe, i.e., the X and Y axis are 'scaled' from the original frame columns and the bin heigths are according to the number of hits in each x/y bin.

import numpy, pylab, pandas
import matplotlib.pyplot as plt

list(rates.columns.values)
['sizes', 'transfers', 'positioning']

x=(rates["sizes"]/1024./1024.)
y=((rates["sizes"]/rates["transfers"])/1024.)+rates["positioning]

so, I try to feed them into a numpy 2D histogram with

histo, xedges, yedges = numpy.histogram2d(x, y, bins=(100,100))

However, this fails with

File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.7/site-packages/numpy/lib/twodim_base.py", line 650, in histogram2d
 hist, edges = histogramdd([x, y], bins, range, normed, weights)
File "/usr/lib64/python2.7/site-packages/numpy/lib/function_base.py" line 363, in histogramdd
 decimal = int(-log10(mindiff)) + 6
ValueError: cannot convert float NaN to integer

I have already dropped all NaN in my rame 'rates.dropna()' - but actually from the error I guess, that it is not due to NaNs in my frame.

Maybe somebody has an idea, what goes wrong here?

1

There are 1 answers

0
THX On BEST ANSWER

with help from @jme I got on the right track

I had not checked for a problematic value pair x:y = 0.0:inf can obviously not be a good 2D histogram vector, i.e., when transforming the original values I have to catch such cases.

another thing: numpy histogram had some issues for me with DataFrame series, so I had to get a proper numpy.arrary from the series to plot them properly, e.g.,

histo, xedges, yedges = np.histogram2d(np.array(x[1:MAX]),np.array(y[1:MAX]), bins=(100,100))

for slicing the series up to some variable MAX