Best way to remove dataframe columns where every value is the same

53 views Asked by At

Let's say I have a DataFrame with multiple columns where the values are all the same e.g. df = pd.DataFrame({'State':['Texas','Texas','Texas','Texas'], 'County':['Harris','Harris','Harris','Harris'], 'Building':[1,2,3,4], 'Budget':[7328,3290,8342,4290]})

I want to write a function that simply scans the df and drops the columns where all values are the same.

I have come up with the below which also returns a separate Series x of the removed columns and their unique value.

I am new to coding and want to understand if there is a simpler way

def drop_monocols(df):              #takes df
    x = dict()
    n = 0
    while n < df.shape[1]:
        if df.iloc[:,n].nunique()==1:                   #finds columns where all values same
            x[df.columns[n]] = df.iloc[0,n]             #adds to dict colname:value
            df = df.drop(df.columns[n], axis=1)         #drops useless col from df
        else:
            n +=1
        x = pd.Series(x)
    return x, df                                        #returns useless col:value series and cleaned df

I am new to coding and want to understand if there is a simpler way. Can I use a for loop with columns instead of while? and is it possible to use .apply here instead of calling a function with df as the arg.

1

There are 1 answers

0
mozway On

You can compare to the first row and see if all rows are the same to perform boolean indexing:

out = df.loc[:, ~df.eq(df.iloc[0]).all()]

Variant, keep if any value is different:

out = df.loc[:, df.ne(df.iloc[0]).any()]

Or, with nunique if you don't have NaNs (or want to ignore NaNs):

out = df.loc[:, df.nunique().gt(1)]

Output:

   Building  Budget
0         1    7328
1         2    3290
2         3    8342
3         4    4290