How will the Imputers work if all the values in a column is missing in input vector in sklearn

Question

How will the Imputers work if all the values in a column is missing in input vector in sklearn

220 views Asked by Jibin Mathew At 26 December 2016 at 09:02

I have a dataset with large number of columns, I have programmed my application in such a way that if any value for the given columns is missing then it would filled with imputer values with mean as the imputer strategy.

However, I am bit concerned that if all the values of the entire column is missing then how would the imputer perform, and what would be the right approach in such a case?

Original Q&A

There are 1 answers

**KevinD** · Answer 1 · 2016-12-26T11:32:38+00:00

If in a given column, all data is missing, then the Imputer will discard that column.

Here is an example, with 4 samples and 2 columns, with one sample having a missing value:

X = np.array([[1,1],[1,2],[1,1],[1,2],[1,np.nan]])
imputer = Imputer(missing_values='NaN', strategy='mean', axis=0)
print(imputer.fit_transform(X))

This prints out

 [[ 1.   1. ]
 [ 1.   2. ]
 [ 1.   1. ]
 [ 1.   2. ]
 [ 1.   1.5]]

However, if all data in the second column is missing:

X = np.array([[1,np.nan],[1,np.nan],[1,np.nan],[1,np.nan],[1,np.nan]])
imputer = Imputer(missing_values='NaN', strategy='mean', axis=0)
print(imputer.fit_transform(X))

We obtain:

[[ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]]

This default behaviour could be the right approach in that case, because this colums (i.e this feature) cannot be used anyway.

TechQA.

How will the Imputers work if all the values in a column is missing in input vector in sklearn

There are 1 answers

Related Questions in PYTHON

Related Questions in MACHINE-LEARNING

Related Questions in SCIKIT-LEARN

Related Questions in TRAINING-DATA

Related Questions in IMPUTATION

Popular Questions

Popular Tags

Trending Questions