remove prefix in all column names

Asked by At

I would like to remove the prefix from all column names in a dataframe.

I tried creating a udf and calling it in a for loop

def remove_prefix(str, prefix):
    if str.startswith(blabla):
        return str[len(prefix):]
    return str

for x in df.columns:

5 Answers

blue_note On Best Solutions

Use the rename method, which accepts a function to apply to column names

def remove_prefix(prefix):
    return lambda x: x[len(prefix):]

frame = pd.DataFrame(dict(x_a=[1,2,3], x_b=[4,5,6]))  
frame = frame.rename(remove_prefix('x_'), axis='columns')
yatu On

You can use str.lstrip to strip the prefix from the column names, this way you avoid looping and checking which do contain the prefix:

# Example dataframe
df = pd.DataFrame(columns=['pre_A', 'pre_B', 'C'])
df.columns = df.columns.str.lstrip('pre_')

Resulting in:

# Index(['A', 'B', 'C'], dtype='object')

Note: This will also remove an occurence of pre_ preceded by another, i.e. all the left side successive occurrences.

AkshayNevrekar On

Use replace in list-comprehension:

df.columns = [i.replace(prefix,"") for i in df.columns]
Angelo Mendes On

Your can read file without headers, using header=None:

pandas.read_csv(filepath_or_buffer=filename, header=None, sep=',')  
jezrael On

Use Series.str.replace with regex ^ for match start of string:

df = pd.DataFrame(columns=['pre_A', 'pre_B', 'pre_predmet'])
df.columns = df.columns.str.replace('^pre_', '')
print (df)
Empty DataFrame
Columns: [A, B, predmet]
Index: []

Another solution is use list comprehension with re.sub:

import re

df.columns = [re.sub('^pre_',"", x) for x in df.columns]