I am trying to build a prediction model but currently keep getting an error: raise ValueError("Input contains NaN") ValueError: Input contains NaN. I tried to use np.any(np.isnan(dataframe)) and np.any(np.isnan(dataframe)), but I just keep getting new errors. For example, TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''.

Here is the code so far:

import pandas as pd
from sklearn.preprocessing import LabelEncoder
import numpy as np

dataframe = pd.read_csv('file.csv', delimiter=',')

le = LabelEncoder()
dfle = dataframe

dfle2 = dfle.apply(lambda col: le.fit_transform(col.astype(str)), axis=0, result_type='expand')

newdf = dfle2[['column1', 'column2', 'column3', 'column4', 'column5', 'column6', 'column7']]

X = dataframe[['column1', 'column2', 'column4', 'column5', 'column6', 'column7']].values

y = dfle.column3

from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
ohe = OneHotEncoder()

ColumnTransformer([('encoder', OneHotEncoder(), [0])], remainder='passthrough')
# np.all(np.isfinite(dfle))
# np.any(np.isnan(dfle))
X = ohe.fit_transform(X).toarray()
2

There are 2 answers

1
Atif Rizwan On BEST ANSWER

You can do multiple things to deal with this error first, you can fill the Nan values by 0 dataframe = pd.read_csv('file.csv', delimiter=',').fillna(0)

or you can use sklearn imputation techniques to fill the Nan value.

https://scikit-learn.org/stable/modules/classes.html#module-sklearn.impute

Multiple Imputation techniques are available but you should use KNNImputer.

0
Alex Newman On

The error

TypeError: ufunc 'isfinite' not supported for the input types,
and the inputs could not be safely coerced to any supported types
according to the casting rule ''safe''

is probably because you're converting to str when doing col.astype(str). Use something like astype(float) instead.

As for the NaN error, you need to figure if it's feasible to solve by just replacing it with zeros (fillna(0)) or if there is the need to go for something more complex like a Kalman filter for example.