Value too large for dtype('float64') sklearn.preprocessing .StandardScaler()

6.8k views Asked by At

When I try to execute this in python:

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)

I get this error:

ValueError: Input contains NaN, infinity or a value too large for dtype('float64')

I know some of the posts have the same title, but most of the time the issue was because the data was a NaN, which is not in my case, as I tried the function below that tells if there is a NaN or infinity number:

import numpy
numpy.isfinite(X_train).all()

where X_train is my float array
( https://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.isfinite.html )

Here is the kind of data that are in X_train. Maybe the range of these data from around -15 to 4000 is too wide to make a scaling possible?

  4068.725
  4035.808
  4067.000
  4051.697
412761.343
   101.905
  4050.460
  4067.000
   -15.303
     1.099
    52.363
    56.739
    68.997
    72.410
    62.171
  4052.077
  4033.867
    33.385
  4050.690
  4031.547
    19.143
    13.494
    11.298
    43.261
1

There are 1 answers

0
spacycookie On

My bad, I thought

numpy.isfinite(X_train).all()

Should return True if all the value where finite and not NaN, but it's false. Good way to find the sneaky values are too exectute the code below :

numpy.argwhere(numpy.isnan(X_train))

(I found some NaN in my array). Thanks for having correcting my question by the way, user3666197, I am quit a newby here.