i have a pandas dataframe with a column named period with 3 different labelling value which means winter 2019, winter 2020 and winter 2021 as shown below

my question is how do i replace this format with the end result being winter 2019, winter 2020 and winter 2021?

Period:
Q4 '19+Q1 '20 
Q4 '20+Q1 '21
Q4 '21+Q1 '22

Q means quarter in a year

So my approach is: 1) first is the regex method to make a new year column. I extract the '19, '20, '21 year number after the first Q4

gas['year'] = gas['Period'].str.extract("([']\d\d)", expand=True)

Then i plan to replace value containing Q4 and Q1 with winter by

gas['Period'].str.contains('Q4')) & (gas['Period'].str.contains('Q1')) = 'winter Gregorian'

but it replaced the whole row containing Q4 and Q1

also tried

gas[gas['Period'].str.replace("[Q][4]..\d\d[+][Q][1]", 'winter Gregorian'), regex =False]

and end result i would like something as:

Period
winter Gregorian 2019
winter Gregorian 2020
winter Gregorian 2021

but didnt work, i welcome any advice and suggestions thanks

2 Answers

0
Community On Best Solutions

I would make use of regex capture groups here. Take a look at this regular expression:

(Q[0-9]) '([0-9]{2})\+(Q[0-9]) '([0-9]{2})

Each item enclosed in parentheses is a group that you can access after matching. For example, in Q4 '19+Q1 '20, the first group in the match is Q4, the second group is 19, the third group is Q1, and the fourth group is 20.

After matching your Period strings against this regex, you can pull out the group matches to construct your desired output format. This code gives you a complete example of how to do this.

import re
import pandas as pd

df = pd.DataFrame({
    "Period": [
        "Q4 '19+Q1 '20 ",
        "Q1 '20+Q2 '20",
        "Q4 '20+Q1 '21",
        "Q4 '21+Q1 '22"
    ]
})

pattern = "(Q[0-9]) '([0-9]{2})\+(Q[0-9]) '([0-9]{2})"
season_map = {
    ('Q4', 'Q1'): 'Winter',
    ('Q1', 'Q2'): 'Spring',
    ('Q2', 'Q3'): 'Summer',
    ('Q3', 'Q4'): 'Fall'
}

def convert_time_format(x):
    match = re.match(pattern, x)
    if match is not None:
        season = season_map.get((match.group(1), match.group(3)))
        year = match.group(2)
        return season + ' ' + year
    else:
        return 'Failed to parse'

df.Period.map(convert_time_format)

That gives:

0    Winter 19
1    Spring 20
2    Winter 20
3    Winter 21
0
Community On

so i produce my own version for to look for all month and format them and got this: TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

pattern3 = "(\w+) '([0-9]{2})"
month_map = {
    ('January'): 'Jan',
    ('February'): 'Feb',
    ('March'): 'Mar',
    ('April'): 'Apr',
    ('June'): 'Jun',
    ('July'): 'Jul',
    ('August'): 'Aug', 
    ('September'): 'Sep',
    ('October'): 'Oct',
    ('November'): 'Nov',
    ('December'): 'Dec',
}

def convert_month(x):
    match = re.match(pattern3, x)
    if match is not None:
        month = month_map.get((match.group(1)))
        year = '20'+match.group(2)
        return month + '_' + year
    else:
        return x

gas['Period'] = gas.Period.map(convert_month)

sorry luke may have to bother you again