I have a dataset with large number of columns, I have programmed my application in such a way that if any value for the given columns is missing then it would filled with imputer values with mean as the imputer strategy.
However, I am bit concerned that if all the values of the entire column is missing then how would the imputer perform, and what would be the right approach in such a case?
If in a given column, all data is missing, then the Imputer will discard that column.
Here is an example, with 4 samples and 2 columns, with one sample having a missing value:
This prints out
However, if all data in the second column is missing:
We obtain:
This default behaviour could be the right approach in that case, because this colums (i.e this feature) cannot be used anyway.