An elegant for-loop for iterating through data frame columns

76 views Asked by At

I intend to handle skewness of a few columns in a data frame using this code:

upper_limit = df['column1'].mean() + 3*df['column1'].std()
lower_limit = df['column1'].mean() - 3*df['column1'].std()
df['column1'] = np.where(df['column1'] > upper_limit, upper_limit, np.where(df['column1'] < lower_limit, lower_limit, df['column1']))

There won`t be a problem to copy/paste this code separately for any column, but I wanted to have an elegant approach for my pleasure. I wrote a few attempts for a for-loop, but they were too embarrassing to post them.

I was wondering if someone here could come up with an intelligent Pythonista variant - short and beautiful?

PS: I don`t want to drop the outliers and np.log() has already been applied.


@Yes`s variant works for me perfectlly:

def handle_skewness(column):
    upper_limit = column.mean() + 3 * column.std()
    lower_limit = column.mean() - 3 * column.std()
    return np.where(column > upper_limit, upper_limit, np.where(column < lower_limit, lower_limit, column))

#iterate through DataFrame columns
for column in X.columns:
    #check if the column is numeric (you can customize this based on what you need)
    if np.issubdtype(X[column].dtype, np.number):
        X[column] = handle_skewness(X[column])
0

There are 0 answers