Let's say I have a DataFrame with multiple columns where the values are all the same e.g. df = pd.DataFrame({'State':['Texas','Texas','Texas','Texas'], 'County':['Harris','Harris','Harris','Harris'], 'Building':[1,2,3,4], 'Budget':[7328,3290,8342,4290]})
I want to write a function that simply scans the df and drops the columns where all values are the same.
I have come up with the below which also returns a separate Series x
of the removed columns and their unique value.
I am new to coding and want to understand if there is a simpler way
def drop_monocols(df): #takes df
x = dict()
n = 0
while n < df.shape[1]:
if df.iloc[:,n].nunique()==1: #finds columns where all values same
x[df.columns[n]] = df.iloc[0,n] #adds to dict colname:value
df = df.drop(df.columns[n], axis=1) #drops useless col from df
else:
n +=1
x = pd.Series(x)
return x, df #returns useless col:value series and cleaned df
I am new to coding and want to understand if there is a simpler way. Can I use a for
loop with columns instead of while
? and is it possible to use .apply here instead of calling a function with df as the arg.
You can compare to the first row and see if
all
rows are the same to perform boolean indexing:Variant, keep if
any
value is different:Or, with
nunique
if you don't have NaNs (or want to ignore NaNs):Output: