Drop column with low variance in pandas

1.9k views Asked by At

I'm trying to drop columns in my pandas dataframe with 0 variance. I'm sure this has been answered somewhere but I had a lot of trouble finding a thread on it. I found this thread, however when I tried the solution for my dataframe, baseline with the command

baseline_filtered=baseline.loc[:,baseline.std() > 0.0]

I got the error

    "Unalignable boolean Series provided as "

IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

So, can someone tell me why I'm getting this error or provide an alternative solution?

1

There are 1 answers

0
jezrael On BEST ANSWER

There are some non numeric columns, so std remove this columns by default:

baseline = pd.DataFrame({
        'A':list('abcdef'),
         'B':[4,5,4,5,5,4],
         'C':[7,8,9,4,2,3],
         'D':[1,1,1,1,1,1],
         'E':[5,3,6,9,2,4],
         'F':list('aaabbb')
})

#no A, F columns
m = baseline.std() > 0.0
print (m)
B     True
C     True
D    False
E     True
dtype: bool

So possible solution for add or remove strings columns is use DataFrame.reindex:

baseline_filtered=baseline.loc[:,m.reindex(baseline.columns, axis=1, fill_value=True) ]
print (baseline_filtered)
   A  B  C  E  F
0  a  4  7  5  a
1  b  5  8  3  a
2  c  4  9  6  a
3  d  5  4  9  b
4  e  5  2  2  b
5  f  4  3  4  b

baseline_filtered=baseline.loc[:,m.reindex(baseline.columns, axis=1, fill_value=False) ]
print (baseline_filtered)
   B  C  E
0  4  7  5
1  5  8  3
2  4  9  6
3  5  4  9
4  5  2  2
5  4  3  4

Another idea is use DataFrame.nunique working with strings and numeric columns:

baseline_filtered=baseline.loc[:,baseline.nunique() > 1]
print (baseline_filtered)
   A  B  C  E  F
0  a  4  7  5  a
1  b  5  8  3  a
2  c  4  9  6  a
3  d  5  4  9  b
4  e  5  2  2  b
5  f  4  3  4  b