Separate numerical values from alphabetic characters within a column as well as expand dates

27 views Asked by At

I would like to separate the numerical values from the alphabetic characters within a column as well as to expand the quarters in a date.

Data

state   stat1   type    stat2   qtr
NY      up      AAA01   Gr      Q1 24
NY      up      AAA02   Re      Q1 24
NY      up      BB01    Gr      Q1 24
NY      up      DD01    Gr      Q1 24
NY      up      DD02    Gr      Q1 24
CA      low     AAA01   Re      Q2 24
CA      low     DD01    Re      Q2 24
CA      low     AAA01   Re      Q2 24
CA      low     SSS01   Gr      Q2 24

Desired

state   stat1   type    stat2   qtr
NY      up      AAA     Gr      Q1 2024
NY      up      AAA     Re      Q1 2024
NY      up      BB      Gr      Q1 2024
NY      up      DD      Gr      Q1 2024
NY      up      DD      Gr      Q1 2024
CA      low     AAA     Re      Q2 2024
CA      low     DD      Re      Q2 2024
CA      low     AAA     Re      Q2 2024
CA      low     SSS     Gr      Q2 2024

Doing

# Extract year from the 'qtr' column
df['year'] = df['qtr'].apply(lambda x: x.split(' ')[-1])

# Modify the 'type' column to include only the first three characters
df['type'] = df['type'].str[:3]

# Concatenate 'qtr' and 'year' columns
df['qtr'] = df['qtr'].apply(lambda x: x.split(' ')[0] + ' ' + x.split(' ')[-1])

However the output does not fully remove the numerical values from the characters within a column. The quarter transformation is not correct as well. Any suggestion is appreciated.

1

There are 1 answers

0
PaulS On BEST ANSWER

A possible solution:

df.assign(type = df['type'].str.replace(r'\d*$', '', regex=True),
          qtr = df['qtr'].str.replace(' ', ' 20'))

Output:

  state stat1 type stat2      qtr
0    NY    up  AAA    Gr  Q1 2024
1    NY    up  AAA    Re  Q1 2024
2    NY    up   BB    Gr  Q1 2024
3    NY    up   DD    Gr  Q1 2024
4    NY    up   DD    Gr  Q1 2024
5    CA   low  AAA    Re  Q2 2024
6    CA   low   DD    Re  Q2 2024
7    CA   low  AAA    Re  Q2 2024
8    CA   low  SSS    Gr  Q2 2024