Convert pandas column names from snake case to camel case

587 views Asked by At

I have a pandas dataframe where the column names are capital and snake case. I want to convert them into camel case with first world starting letter to be lower case. The following code is not working for me. Please let me know how to fix this.

import pandas as pd

# Sample DataFrame with column names
data = {'RID': [1, 2, 3],
        'RUN_DATE': ['2023-01-01', '2023-01-02', '2023-01-03'],
        'PRED_VOLUME_NEXT_360': [100, 150, 200]}

df = pd.DataFrame(data)

# Convert column names to lowercase
df.columns = df.columns.str.lower()

# Convert column names to camel case with lowercase starting letter
df.columns = [col.replace('_', ' ').title().replace(' ', '').replace(col[0], col[0].lower(), 1) for col in df.columns]

# Print the DataFrame with updated column names
print(df)

I want to column names RID, RUN_DATE, PRED_VOLUME_NEXT_360 to be converted to rid, runDate, predVolumeNext360, but the code is giving Rid, RunDate and PredVolumeNext360.

5

There are 5 answers

0
mozway On BEST ANSWER

You could use a regex to replace _x by _X:

df.columns = (df.columns.str.lower()
                .str.replace('_(.)', lambda x: x.group(1).upper(),
                             regex=True)
             )

Or with a custom function:

def to_camel(s):
    l = s.lower().split('_')
    l[1:] = [x.capitalize() for x in l[1:]]
    return ''.join(l)

df = df.rename(columns=to_camel)

Output:

   rid     runDate  predVolumeNext360
0    1  2023-01-01                100
1    2  2023-01-02                150
2    3  2023-01-03                200
0
Suraj Shourie On

Looking at your code, I would say that your prompts for GPT were not accurate.

You can use split instead of replace and title the elements after index 0. See code below:

df.columns = [''.join([x if i ==0 else x.title() for i,x in enumerate(col.split("_"))]) for col in df.columns]
print(df.columns)

Output:

Index(['rid', 'runDate', 'predVolumeNext360'], dtype='object')
0
Timur Shtatland On

Define methods to convert to lower camel case separately for clarity:


import pandas as pd

def to_camel_case(snake_str):
    return "".join(x.capitalize() for x in snake_str.lower().split("_"))

def to_lower_camel_case(snake_str):
    # We capitalize the first letter of each component except the first one
    # with the 'capitalize' method and join them together.
    camel_string = to_camel_case(snake_str)
    return snake_str[0].lower() + camel_string[1:]

# Sample DataFrame with column names
data = {'RID': [1, 2, 3],
        'RUN_DATE': ['2023-01-01', '2023-01-02', '2023-01-03'],
        'PRED_VOLUME_NEXT_360': [100, 150, 200]}

df = pd.DataFrame(data)

# Convert column names to camel case with lowercase starting letter
df.columns = [to_lower_camel_case(col) for col in df.columns]

# Print the DataFrame with updated column names
print(df)

Prints:

   rid     runDate  predVolumeNext360
0    1  2023-01-01                100
1    2  2023-01-02                150
2    3  2023-01-03                200

The methods are based on this answer by jbaiter.

0
Andrés Vázquez Aviña On
df.columns = [col[0].lower() +col.replace('_', ' ').title().replace(' ', '')[1:] for col in df.columns]

Try this line instead of the line previous the last print. It did the trick for me

0
alec_djinn On

I would use str.capitalize() in a tiny function like the following:

def snake_to_camel(snake_string):
    s = snake_string.lower().split('_')
    return ''.join([s[0]]+[i.capitalize() for i in s[1:]])


print(snake_to_camel("TEST_CASE_number1")) #'testCaseNumber1'

df.columns = list(map(snake_to_camel, df.columns))