Numpy sum function returns 1.67772e+07

1.1k views Asked by At

I have two big (432*136*136*46) 'numpy.ndarray' H1 and H2 which encompass altitude values corresponding to two simulations. I want to generate an array with 1 when H1 and H2 have the same altitude and 0 when they don't. Then, I want to know how many elements I selected, so I want to calculate the sum of the elements of this matrix. Here is my code :

H1=np.concatenate([np.around(files1[i].hrtm()[:,0:45,:,:]/h) for i in range(0,files1.__len__())])
H2=np.concatenate([np.around(files2[i].hrtm()[:,0:45,:,:]/h) for i in range(0,files2.__len__())])

diff=np.absolute(H1-H2)
diff[diff==0.]=np.float64(-1.)
diff[diff!=-1]=np.float64(0.)
diff=diff*diff

print np.sum(diff)

And here is my output, which is always the same and does not depend on the data:

1.67772e+07

After some research, I read that it's related to the maximum size of a float. I tried several formats, replacing np.float64 by int, float, np.float32, or nothing, and they all give the same results.

Do you have an idea of how I could do to have the real number ?

1

There are 1 answers

4
Daniel On BEST ANSWER

The type of your diff-array is the type of H1 and H2. Since you are only adding many 1s you can convert diff to bool:

print diff.astype(bool).sum()

or much more simple

print (H1 == H2).sum()

But since floating point values are not exact, one might test for very small differences:

print (abs(H1 - H2) < 1e-30).sum()