mean() of column in pandas DataFrame returning inf: how can I solve this?

Question

mean() of column in pandas DataFrame returning inf: how can I solve this?

24.5k views Asked by Augusto Ribas At 11 June 2015 at 13:44

I'm trying to implement some machine learning algorithms, but I'm having some difficulties putting the data together.

In the example below, I load a example data-set from UCI, remove lines with missing data (thanks to the help from a previous question), and now I would like to try to normalize the data.

For many datasets, I just used:

valores = (valores - valores.mean()) / (valores.std())

But for this particular dataset the approach above doesn't work. The problem is that the mean function is returning inf, perhaps due to a precision issue. See the example below:

bcw = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data', header=None)

for col in bcw.columns:
    if bcw[col].dtype != 'int64':
        print "Removendo possivel '?' na coluna %s..." % col
        bcw = bcw[bcw[col] != '?']

valores = bcw.iloc[:,1:10]
#mean return inf
print  valores.iloc[:,5].mean()

My question is how to deal with this. It seems that I need to change the type of this column, but I don't know how to do it.

Original Q&A

There are 5 answers

ali_m On 11 June 2015 at 14:07

NaN values should not matter when computing the mean of a pandas.Series. Precision is also irrelevant. The only explanation I can think of is that one of the values in valores is equal to infinity.

You could exclude any values that are infinite when computing the mean like this:

import numpy as np

is_inf = valores.iloc[:, 5] == np.inf
valores.ix[~is_inf, 5].mean()

BrotherJack On 24 October 2015 at 17:34

I had the same problem with a column that was of dtype 'o', and whose max value was 9999. Have you tried using the convert_objects method with the convert_numeric=True parameter? This fixed the problem for me.

gil.fernandes On 12 October 2018 at 07:08

If the elements of the pandas series are strings you get inf and the mean result. In this specific case you can simply convert the pandas series elements to float and then calculate the mean. No need to use numpy.

Example:

valores.iloc[:,5].astype(float).mean()

Florian Brucker On 26 September 2022 at 10:14

For me, the reason was an overflow: my original data was in float16 and calling .mean() on that would return inf. After converting my data to float32 (e.g. via .astype("float32")), .mean worked as expected.

**Dave** · Accepted Answer · 2015-06-11T13:51:56+00:00

Dave On 11 June 2015 at 13:51 BEST ANSWER

not so familiar with pandas but if you convert to a numpy array it works, try

np.asarray(valores.iloc[:,5], dtype=np.float).mean()

TechQA.

mean() of column in pandas DataFrame returning inf: how can I solve this?

There are 5 answers

Related Questions in PYTHON

Related Questions in NUMPY

Related Questions in PANDAS

Related Questions in PRECISION

Popular Questions

Popular Tags

Trending Questions